Personal tools

Protocols:Motif Activity Response Analysis(MARA): Difference between revisions

From FANTOM5_SSTAR

Jump to: navigation, search
No edit summary
No edit summary
 
Line 11: Line 11:
In addition,
In addition,


1. The header section of the OSC Table file should specify the genome assembly
1. The header section of the OSC Table file should specify the genome assembly that was used;
  that was used;
2. The data section should contain the normalized (tags-per-million) expression data (columns containing raw data will be ignored);
2. The data section should contain the normalized (tags-per-million) expression
3. The data section should contain a column labeled 'pos' that shows the representative position within each promoter. Usually this representative position is defined as the most highly expressed position within a promoter, but you can choose the criterion as you want. The position should be a zero-based coordinate.
  data (columns containing raw data will be ignored);
3. The data section should contain a column labeled 'pos' that shows the
  representative position within each promoter. Usually this representative
  position is defined as the most highly expressed position within a promoter,
  but you can choose the criterion as you want. The position should be a
  zero-based coordinate.
 
 
If you have microarray data instead of CAGE data, you can convert them
into the appropriate OSC Table file format.





Latest revision as of 13:59, 31 August 2012

Author : Michiel De Hoon

Last updated: 2012.03.29


OSC Table input file requirements


Your OSC Table file should follow the general OSC Table file requirements. In particular, note that the first column of the OSC Table file should be the cluster identifier. The name of this column should be "id".

In addition,

1. The header section of the OSC Table file should specify the genome assembly that was used; 2. The data section should contain the normalized (tags-per-million) expression data (columns containing raw data will be ignored); 3. The data section should contain a column labeled 'pos' that shows the representative position within each promoter. Usually this representative position is defined as the most highly expressed position within a promoter, but you can choose the criterion as you want. The position should be a zero-based coordinate.


Step 1: Calculate the binding profile of TFs with respect to the TSS


Calculate for each TF the binding profile with respect to the transcription start site. If you are lazy, instead of calculating this yourself you can also use the binding profile as calculated using the FANTOM5 data.

Also, in the data section, the representative position for each promoter should be shown in a column labeled "pos" (without the quotes). In most cases, the representative position of a promoter is defined as its most highly expressed position, though other definitions can in principle be used. In the current version of the motif activity pipeline, there are no restrictions on the name of the promoter (as shown in the first column in the data section of the OSC Table file).

This step makes use of the precalculated TFBSs stored in a separate file.


Step 2: Associate TFBSs with promoters


Associate predicted TFBSs to CAGE promoters. This script also makes use of the precalculated TFBSs stored. This step creates a single file containing the associations of predicted TFBSs to promoters.


Step 3: Calculate the motif activities


Use the file you created in Step 2 to calculate the motif activities. The threshold on the number of predicted TFBSs for each motif defaults to 150. This means that motifs with less than 150 predicted binding sites are discarded. This step will generate a single file containing the motif activities in each condition, their standard deviations, and their overall Z-scores.


Step 4: Calculate the network as predicted from the motif activities


Calculate the MARA network using the file containing the motif activities calculated in Step 3, and the file containing the predicted TFBSs for each promoter calculated in Step 2. A threshold on the Z-score on the network edges can be specified; this threshold defaults to 1.5. This step creates a single file containing the motif-to-promoter network as calculated by MARA.


Step 5: Convert the MARA network to a Cytoscape-loadable file


To view the network in Cytoscape, the MARA network file needs to be converted to an input file in the proper format for Cytoscape. The resulting Cytoscape input file corresponds to a motif-to-promoter network as extracted from the full MARA network.


Step 6: Find the top motifs in each experimental condition


Calculate the Z-score for each motif in each experimental condition, and sort the motifs based on their Z-score.


Step 7: Create subnetworks for each experimental condition


From the full MARA network extract subnetworks for each experimental condition, showing only the top motifs in each.