Track visualization styles

The Tracks in the ZENBU genome browser fall into three main categories of visualization styles


 * Annotation tracks: where the data sources only contain genomic information and no signal-based data Annotation-track-transcript.png This includes the styles :  thick-arrow, medium-arrow, arrow, centroid, transcript, transcript2, thick-transcript, thin-transcript, box, thick-box, thin-box, probesetloc, seqtag, scorethick, and cytoband.


 * Numerical signal tracks: where signal level is displayed without feature boundaries but in an user interactive data exploration tool. Signal_style_tracks.png This includes the styles: signal-histogram, xyplot, 1D-heatmap, and experiment-heatmap


 * Hybrid tracks: ZENBU enhanced visualization which allow for processed data to contain both genomic features and multi-experiment signal-based data Hybrid_tracks_longRNAseq.png. This primarily includes the use of annotation drawing styles with dynamic signal mapped colorings.

=Annotation Tracks= Annotation tracks are for the visualization of genomic positional data within the ZENBU genome browser.



The visualization style of visualization can be changed in the track configuration interface panel's Visualization section.

Different visualization styles vary in the amount of information displayed and the amount of vertical screen space used. For dense data tracks, more compact visualization my be better depending on how one will use the visualization. The strand of the annotation is color coded. The interface provides color-pickers to allow one to choose any color to identify the forward/reverse strand features and the background color.

=Numerical signal tracks=

Signal based tracks are for the visualization of numerical expression data or other forms of signal intensity from Experiment Data Sources in the ZENBU genome browser on a segmented genomic grid similar to the UCSC genome browser wiggle visualization.

Signal histogram visualization
In the signal-histogram visualization, numerical signal (like RNA expression, logFC, pvalues...) is visualized as a signal-height graph along genomic coordinate space. ZENBU can visualize both strandless and stranded signal histrograms.

Here is an example of FANTOM4 CAGE signal which is stranded in nature

Here is an example of FANTOM4 ChipCHiP signal which is strandless in nature

Here is an example of ENCODE strandless-protocol RNAseq signal configured to only display expression signal in areas of sequence alignment (skipping gaps of alignments) http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=l_D-jGt1IlehEahizVAMeB;loc=hg19::chr8:128746973..128755020

Here is an example of ENCODE stranded-protocol RNAseq exonic expression signal

In order to create this style of visualization the primary expression data must be processed using either the graphical interface expression binning script GUI processing modules or with a custom data processing script to create the dynamic genomic segmented grid.

The expression binning processing GUI parameters are as follows :
 * overlap mode : since ZENBU can work directly with sequence alignment data (often uploaded from BAM files) it is necessary to modify the alignments to be properly visualized. The options here are::
 * area under the curve: the signal is spread evenly along the length of the alignment so that the area-of-the-curve represents the level of signal. This only effects alignments which overlap more than one of the genomic segmentation bins. If all alignments are shorter than the genomic segmentation then area and height modes generate the same visualization.
 * height: the signal is collated so that the height of curve represents the level of signal at the genomic segment. This only effects alignments which overlap more than one of the genomic segmentation bins. If all alignments are shorter than the genomic segmentation then area and height modes generate the same visualization.
 * 5'end: the signal is concentrated at the 5'end of the sequence alignment prior to being collated into the genomic segmentation binning. This is primary used for CAGE-based sequencing experiments
 * 3'end: the signal is concentrated on the 3'end of the sequence alignment prior to being collated into the genomic segmentation binning. Currently there are few RNA sequencing technologies which can utilize this mode of processing but is included for new technology development.


 * expression binning: the mathematical operation used when multiple signal points from the same experiment collate into the same genomic segmentation bin. Each Experiment is kept distinct and this math is applied across different signal-based features within the same Experiment. The options are:
 * sum : sum the different signal values within each experiment
 * min : calculate the minimum value of the different signal values of each experiment
 * max : calculate the maximum value of the different signal values of each experiment
 * mean : calculate the mean average of different signal values of each experiment
 * count : simply report the count of different signal values within each experiment that collate into the genomic segmentation bin.


 * fixed bin size: by default the processing script creates dynamic bin sizes based on the zoom level of the genomic view and the width of the display in order that each segmentation bin maps approximately to a single pixel width on the screen. This ensures that a fine enough visualization resolution is preserved without creating unneeded sub-pixel resolution. But if a finer or courser segmentation binning is desire it can be entered here. For example the track above using a 100base pair fixed binning size. Woldlab_RNAseq_exonic_expression_signal_100bp.jpg


 * process ignoring strand: if the primary signal-based experiments are using a strandless protocol or one wishes to process stranded signal in a strandless manner, check this and a strandless genomic segmenation binnning grid will used and strand of the primary data will be ignored. It the data is processed as strandless it is best to also select the strandless option within the visualization options.


 * overlap via subfeatures: sometimes RNA sequencing experiments generate gapped sequence alignments when an RNA molecule spans an intronic splicing junction. This information is contained in BAM files and is preserved durring ZENBU uploading. To get an accurate visualization of true RNA exonic signal these intronic gaps should not be collated into the genomic segmentation bins. The example above of the ENCODE Wold lab RNAseq experiments contain such gapped alignments. Here is this BAM sequence alignment data processed without this option enabled and both RNA exon and intron signal is collated into the signal-based visalization. Woldlab_RNAseq_exonic_expression_signal_withgaps.jpg

Additional visualization options available for expression Experiment tracks (visualization style of signal-histrogram)
 * hide empty experiment: this parameter effects the track-linked Experiment Expression panel. If selected, only those Experiments with a non-zero expression value are displayed.
 * color on signal: currently has no effect when the track is in ''signal-histogram" mode
 * display datatype: depending on how the track was configured and processed there may be more than one datatype available for visualization. If more than one is available, please select.
 * background color: the option of altering the background color to help visually group related tracks in very large views. color can be specified using any of the html web color syntaxes (named colors, #FFFFFF style or rgb(255,255,255) style).
 * track pixel height: adjusts the screen height of the track. this can also be adjusted with the resize widget on the left side of the track with click-drag. Express_track_height_resize.jpg
 * signal scale: adjusts the numerical scale on which the signal data values are displayed. by default this is auto meaning that the signal-based track is visually rescaled to fit into the height of the track. If one desires to use a fixed scaling among several tracks, this can be set here. Tracks with more signal than this scale limit are clipped.
 * log scale: for visualization the expression can be dynamically compressed onto a log scale. If the expression has huge dynamic range, this can be helpful to expand the low background signal and compress the higher peaks. For example here is the FANTOM4 CAGE signal-based track from above visualized on a log scale. Express_track_fantom4_cage_logscale.jpg
 * strandless: this visualization option should be set in coordination with data processing which is also strandless.

1D heatmap visualization
The same expression-binning data processing can be also displayed in the 1D-heatmap visualization style. This is a visualization style for expression data but can also be applied to non-overlapping hybrid data. It draw the expression on a single layer of a track using only the false-color-spectrum to visualize expression differences. It can be used in combination with normal "expression binning" processing or with more advanced scripts. The 1D-heatmap visualization does not display strand information so processing should be done in a strandless or separated-strand manner using custom scripting.



This example RNAseq data is processed to display only exonic signal (no gaps/introns) and displayed with the "brewer-sequential-9-blues" and the "zenbu-secptrum blue1" false-color-spectrums. This style of visualization gives a very compact track which allows people to use it in situations where they might need many separate signal-based tracks.



In addition to a custom set of color spectrums labeled as "zenbu-spectrum", we have also imported all the Brewer-colorpicker palettes ( http://colorbrewer2.org/js/ ) which provide a discrete leveling of color signals. The Brewer palettes are divided into three categories (sequential, diverging, qualitative) and one can choose the number of discrete levels in addition to the colorspace name. Any color-space can be mapped to the numerical signal via either a linear method, or by a log method. The preview of the color shows the resultant color-space.





Experiment heatmap visualization
This is a visualization style for datasource pooled tracks with many experiments and expression. In this style of visualization each experiment is given a unique horizontal layer in the image, vertical slices represent genomic segments, and the false-color-spectrum is applied to the expression value at the intersection of genomic-position and experiment. This style of visualization simultaneously shows spatial variation in expression and differential expression between experiments.

In this example the RNAseq is processed for exonic signal and binned into a genomic-segmentation grid and experiments are sorted based on most expression value.



Hovering over elements in the heatmap reveals the name of the experiment, the location and the expression value collated into that genomic segment.



The order of experiments matches the order in the linked Experiment-expression graph and resorting in that panel, changes the sort-order of experiments in the heatmap. Here the sort order is changed to be by sample cell-type name.



Another example showing more dramatic differential expression and spatial difference between RNAseq exonic signal among different ENCODE samples.



xyplot - stranded, signed signal visualization
Sometimes processed signal data includes negative numbers in addition to strand information. For these situations we have included a visualization style called xyplot. In xyplot the vertical axis encode positive/negative signal and the horizontal axis encodes genome position. The strand is encoded in the color of the dot and can be altered with the track reconfiguration color-pickers.

Here is an example of transcript gene models encoded with logFC (log-fold-change) of expression displayed in xyplot style. The "dots" are extended horizontally along the length of the feature, the y-axis position encodes the logFC and the color encodes the strand. In this particular case the logFC is comparing Gata3-knock-out verse Th2-wild-type from the Gang Wei et.al "Genome-wide Analyses of Transcription Factor GATA3-Mediated Gene Regulation in Distinct T Cell Types" Immunity, Volume 35, Issue 2, 299-311, 26 August 2011 paper.



And here is another example of ChIP-seq signal displayed in xyplot style. Since the ChIP-seq signal does not have any negative values both strands are displayed on the positive part of the xyplot but the colors show the different strands. Below the xyplot is the normal strandless signal-histogram visualization of the same data.



In the future ZENBU will incorporate differential expression processing which will better take advantage of this style of visualization.

=Hybrid Tracks=

ZENBU advanced visualization tracks which combine genomic annotation and expression Experiments, often in combination with ZENBU data processing to create novel visualizations. There are several different types of visualizations which can be categorized as hybrid tracks.

Signal false coloring of genomic features
In this style of hybrid visualization, genomic annotation Features have signal collated onto them (eg RNA expression, ChipSeq signal..). This can either be generated inside ZENBU by a data processing script or by utilizing the BED file with "BED.score column has experimental signal values" loading options or with OSCtable files with combined annotation and experimental-signal. This visualization is enabled by selecting one of the annotation visualization styles, checking the color on signal box and selecting a false color spectrum.

For example here is a track which uses ZENBU data processing of the ENCODE wold-lab RNAseq expression (which was loaded via BAM files) collated into Gencode Gene models to give gene expression. This data processing is then visualized as a hybrid track with the transcript visualization stlye and the color on signal option with the fire1 false-color-spectrum. The top track is the hybrid track showing the processed gene expression, and the two tracks below are the RNAseq expression signal track which was then collated into the Gencode gene models which are shown in the third track. For details on how to create tracks like this, please see the case study RNAseq_expression_collated_onto_gene_models

Here is a variation on the previous collated-expression situation, but here we use advanced scripting to dynamically generate new genomic-features from the primary data and then use false-color-spectrum to show their abundance.



In this track RNAseq alignment gaps are extracted by ZENBU processing into new genomic-features and then "uniqued" and counted. In gapped RNAseq, long gaps mainly occur because of RNA spanning introns and these gaps represent evidence for introns. These "intron evidence" features are then filtered for length and minimum abundance before being displayed using "medium-exon" and a "fire1" spectrum.