ResizeFeatures

Data Stream Processing > Processing Modules > General manipulation Modules

Description
The ResizeFeatures processing module is designed to work on Features to alter their genomic coordinates. The module will resort features on the data stream as needed to preserve the stream integrity. Typical use cases
 * shrink the feature to its 5' end and make it 1bp wide (CAGE data)
 * expand a feature 500bp upstream of it's 5'end (ex extending refseq genes into their potential promoter regions to use for overlap analysis)

Parameters

 *  : only Features matching categories will be resized
 *   : defines which resize method will be performed
 * shrink_start : shrink the feature to 1bp length on the chrom_start
 * shrink_end : shrink the feature to 1bp length on the chrom_end
 * shrink_5prime : shrink the feature to 1bp length on the 5' end (start for + strand, end of - strand)
 * shrink_3prime : shrink the feature to 1bp length on the 3' end (start for - strand, end of + strand)
 * expand_start : expand the chrom_start by amount up-stream
 * expand_end : expand the chrom_end by amount down-stream
 * expand_5prime : expand the 5' end by amount
 * expand_3prime : expand the 3' end by amount
 * store : save the coordinates prior to resizing, such that they can be restored in a latter call to ResizeFeatures
 * restore : redefine the coordinates as those stored in a previous ResizeFeatures call
 *   : for expand modes this is the amount to be expanded.
 *  : Defaults to "false" which is the desired behavior: after resising the subfeatures have no more meaning, unless the original coordinates are thought to be store(d)/restore(d) back. Hence, this flag must be set to "true" if you plan to redefine the coordinates to those stored in a previous ResizeFeatures call along with its subfeatures.

Example
This script combines a CalcInterSubfeatures modules with a StreamSubfeatures module to generate introns. This is followed by ResizeFeatures and UniqueFeature to reduce the introns into a set of unique intron donor sites with counts of their abundance.

    intron true true  shrink_start   

Here is a ZENBU view showing this script in use http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=fuf2V6ehKhHlbabZkQWebB;loc=hg19::chr8:128746973..128755020

This script exemplifies how one would collate CAGE based expression into transcritps. It uses the combination of several consecutive ResizeFeatures modules, the first ones defining the [-500..+500] region around the beginning of transcritps from which CAGE TSS can be associated (using the with a TemplateCluster module) to the transcripts, and a last final one restoring the original transcript coordinates and their associated exons structure (subfeatures)  Lists RefSeq transcript sources for all the main assemblies loaded in zenbu <datastream name="refseq" output="full_feature"> <source id="025F0224-D145-4E28-86A2-DB37A42A89CB::21:::FeatureSource" name="UCSC_RefSeq_canFam2_20120101"/> <source id="025F0224-D145-4E28-86A2-DB37A42A89CB::5:::FeatureSource" name="UCSC_RefSeq_galGal3_20120101"/> <source id="D71B7748-1450-4C62-92CB-7E913AB12899::13:::FeatureSource" name="UCSC_RefSeq_hg19_20120101"/> <source id="4043B030-0201-495F-824B-BC197EA3C272::6:::FeatureSource" name="UCSC_RefSeq_mm9_20120101"/> <source id="025F0224-D145-4E28-86A2-DB37A42A89CB::35:::FeatureSource" name="UCSC_RefSeq_rn4_20120101"/> <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::78:::FeatureSource" name="UCSC_hg18_refgene"/>



Get the Transcriptional Start Sites (TSS) revealed by the 5'extremity of CAGE derived reads <spstream module="ResizeFeatures"> shrink_5prime

Collate the CAGE TSS along regions defined as RefSeq TSS +/-500bp <spstream module="TemplateCluster">  <side_stream> <spstream module="Proxy" name="refseq"/>

Modify the coordinates to refseq TSS, save temporaly the original coordinates for later call back <spstream module="ResizeFeatures"> <retain_subfeatures>true</retain_subfeatures> store

Modify the coordinates to refseq TSS+/-500bp <spstream module="ResizeFeatures"> <retain_subfeatures>true</retain_subfeatures> shrink_5prime <spstream module="ResizeFeatures"> <retain_subfeatures>true</retain_subfeatures> expand_start 500                            <spstream module="ResizeFeatures"> <retain_subfeatures>true</retain_subfeatures> expand_end 500

</side_stream>

Restore RefSeq original coordinates <spstream module="ResizeFeatures"> <retain_subfeatures>true</retain_subfeatures> restore

Sum up the expression over all samples and save the value as the refseq score to color it accordingly <spstream module="CalcFeatureSignificance"> <expression_mode>sum</expression_mode>

</stream_processing> </zenbu_script>

Here is a ZENBU view showing this script in use http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=PyTxIWwAO5apFGJYNVOGjB;loc=hg18::chr11:129118781..129899006