FANTOM - Software

Mogrify
A directory of defined factors for direct cell reprogramming.
AVAILABILITY: http://www.mogrify.net/
CITATION:
Owen J L Rackham, Jaber Firas, Hai Fang, Matt E Oates, Melissa L Holmes, Anja S Knaupp, The FANTOM Consortium, Harukazu Suzuki, Christian M Nefzger, Carsten O Daub, Jay W Shin, Enrico Petretto, Alistair R R Forrest, Yoshihide Hayashizaki, Jose M Polo, Julian Gough. A predictive computational framework for direct reprogramming between human cell types
Nature Genetics 2016
DOI: 10.1038/ng.3487
CIDER
A pipeline for detecting waves of coordinated transcriptional regulation in gene expression time-course data.
AVAILABILITY: The CIDER source code and the validation datasets are available on request from the corresponding author.
CITATION:
Marco Mina, Giuseppe Jurman, Cesare Furlanello. CIDER: a pipeline for detecting waves of coordinated transcriptional regulation in gene expression time-course data. bioRxiv. DOI: 10.1101/012518
Delve
A probabilistic short read aligner used in FANTOM5 and ENCODE.
LICENSE: GNU General Public License
AVAILABILITY:
version 0.9 delve-0.9.tgz
version 0.95 delve-0.95.tgz
CAGEr
CAGEr provides a comprehensive toolbox for analysis and visualization of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites (TSS) and promoterome mining in R. It performs identification of transcription start sites and frequency of their usage from input CAGE sequencing data, normalization of raw CAGE tag count, clustering of TSSs into tag clusters (TC) and their aggregation across multiple CAGE experiments to construct the promoterome. It manipulates multiple CAGE experiments at once, performs expression profiling across experiments both at level of individual TSSs and clusters of TSSs and exports several different types of track files for visualization in the genome browser. Methods for analysis of promoter width and detection of differential usage of TSSs (promoter shifting) between samples are also provided. The package is accompanied by data packages containing FANTOM and ENCODE CAGE data that can be readily used in R, analyzed with provided tools and integrated with other genomic data.
REQUIREMENTS: R, Bioconductor
LICENSE: GNU General Public License
AVAILABILITY: http://bioconductor.org/packages/release/bioc/html/CAGEr.html
CAGExploreR
CAGExploreR is an R package that facilitates the detection and visualization of changes in the relative transcription from promoter regions in multi-promoter genes, all in the context of overall gene expression. Multiple samples can be compared simultaneously. Primarily based on the FANTOM5 promoter set definitions, however other regions such as MPromDb or user-supplied regions can also be used.
REQUIREMENTS: R version 3.0.2 or later.
LICENSE: MIT
AVAILABILITY: http://cran.r-project.org/web/packages/CAGExploreR/index.html
SAMPLE DATASET: A sample data set is included with the software package.
CONTACT: edimont@mail.harvard.edu
CITATION:
Dimont, E. et al. (2014). CAGExploreR: an R package for the analysis and visualization of promoter dynamics across multiple experiments. Bioinformatics. DOI: 10.1093/bioinformatics/btu125
RECLU
RECLU is a reproducible clustering pipeline with multiple scales using capped analysis of gene expression (CAGE). This program discoveries numerous alternative transcription start sites (TSSs) with the biological implication for your sample. This directory contains the standalone program and the package for the Moirai system.
AVAILABILITY: http://en.sourceforge.jp/projects/reclu/
CONTACT: hiroko.ohmiya@riken.jp
Zenbu
ZENBU: a data integration, data processing, and visualization web system
AVAILABILITY: http://sourceforge.net/projects/zenbu/
CITATION:
Severin et al. (2014) "Interactive visualization and analysis of large-scale NGS data-sets using ZENBU." Nature Biotechnology
DOI: 10.1038/nbt.2840
Moirai
MOIRAI: A Compact Workflow System for CAGE Analysis
LICENSE: GNU General Public License
AVAILABILITY: http://sourceforge.net/projects/moirai/
CITATION:
BMC Bioinformatics submitted.
TomeTools
A collection of programs to store and manipulate thousands of CAGE datasets.
LICENSE: GNU General Public License
AVAILABILITY: http://tometools.sourceforge.net
SAMstat
Displaying sequence statistics for next generation sequencing
LICENSE: GNU General Public License
AVAILABILITY: http://samstat.sourceforge.net
CITATION:
Lassmann et al. (2010) "SAMStat: monitoring biases in next generation sequencing data." Bioinformatics
DOI: 10.1093/bioinformatics/btq614; PMID: 21088025
TagDust
A program to eliminate artifacts from next generation sequencing data.
LICENSE: GNU General Public License
AVAILABILITY:http://tagdust.sourceforge.net
CITATION:
Lassmann et al. (2009) "TagDust--a program to eliminate artifacts from next generation sequencing data." Bioinformatics. 2009 Nov 1;25(21):2839-40.
DOI: 10.1093/bioinformatics/btp527; PMID: 19737799
Decomposition-based peak identification
Decomposition-based peak identification, which find peaks across a large number of TSS (transcription starting site) profiles.
REQUIREMENTS: R, fastICA package, bigWigToBedGraph in jksrc.zip, bedTools
LICENSE: GNU General Public License
AVAILABILITY: https://github.com/hkawaji/dpi1/
CITATION:
A promoter level mammalian expression atlas, Forrest A, Kawaji H, Rehli M, et al. (submitted)
CAGEScan-Clustering
CAGEScan-Clustering creates transcript assemblies from CAGEScan derived Transcription Start Site (TSS) associated reads paired with randomly primed readsm, grouping them on the basis of the common location of TSS reads. Assembly-seeding TSS read clusters can either be provided as an external BED file or computed from the CAGEscan data itself. See `CAGEScan-Clustering.pl --help` for details.
REQUIREMENTS: Unix / Linux. Perl v5.10.1 or higher. BedTools v2.9.0 or higher. optionally samtools version: 0.1.7 (r510) or higher
LICENSE: GNU General Public License
AVAILABILITY: https://github.com/nicolas-bertin/CAGEscan-Clustering
CONTACT: nbertin@gsc.riken.jp
CITATION:
Kratz et al. "Digital expression profiling of the compartmentalized translatome of purkinje neurons." Genome Research gr.164095.113+ (2014).
DOI: 10.1101/gr.164095.113
bedtools-pairedBamToBed12
Addition of a pairedBamToBed12 utility by Nicolas Bertin (OSC RIKEN Yokohama) to BEDTools Created by Aaron Quinlan Spring 2009.
AVAILABILITY: https://github.com/nicolas-bertin/bedtools-pairedBamToBed12
CONTACT: nbertin@gsc.riken.jp
SDRF2GRAPH
SDRF2GRAPH is an application to produce graphical image of investigation design graph (IDG) based on SDRFs written in a MAGE-tab formatted spreadsheet(*.xlsx).
REQUIREMENTS: Ruby, rexml, rubyzip, GraphViz.
LICENSE: Ruby's license
AVAILABILITY: SDRF2GRAPH web site
CITATION:
Hideya Kawaji et al., "SDRF2GRAPH - a visualization tool of a spreadsheet-based description of experimental processes". BMC Bioinformatics 2009, 10:133
Nexalign
Nexalign is a program to align millions of short reads from next-generation sequencing data sets to reference genomes.
REQUIREMENTS: Unix / Linux.
LICENSE: GNU General Public License
AVAILABILITY: nexalign-1.3.5.tgz
CONTACT: timolassmann@gmail.com
EdgeExpressDB (eeDB)
EdgeExpressDB (eeDB) is a federated data abstraction system designed for integrating, interpreting, and visualizing very large biology datasets. It is designed for scaling beyond Petabytes and 10^13 objects. For those interested in installing your own instances of EEDB the source code is available via CPAN and is being further developed within the Omics Science Center by Jessica Severin.
REQUIREMENTS: Perl DBI/DBD, MySQL, SQLite.
LICENSE: BSD License
AVAILABILITY: http://sourceforge.net/projects/eedb/, available via CPAN (http://search.cpan.org/~jms/EdgeExpressDB_0.953h/).
CITATION:
Jessica Severin, et.al. FANTOM4 EdgeExpressDB: an integrated database of genes, microRNAs, their promoters, expression dynamics and regulatory interactions. Genome Biology, 10:R39, 1-9 (2009)
MuMRescueLite
MuMRescueLite is the software that enable to use the tag sequencies of mapped to multiple loci to the genome, for the expression analysis. At the mapping of short sequence tags of CAGE or ChIP-Seq to the genome, sequence tags that map to multiple genomic loci (multi-mapping tags or MuMs), are routinely omitted from further analysis, leading to experimental bias and reduced coverage. MuMRescueLite probabilistically reincorporates multi-mapping tags into mapped short read data with acceptable computational requirements.
REQUIREMENTS: Python2.4 or later; platform is same to the Python itself.
LICENSE: The MIT License.
AVAILABILITY: MuMRescueLite_090522.tar.gz
SAMPLE DATASET: MuMRescueLite_test_data.tsv.gz
CITATION:
Faulkner, G.J., et al. (2008) A rescue strategy for multi-mapping short sequence tags refines surveys of transcriptional activity by CAGE, Genomics.
Hashimoto, T., et al. (2009) Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite, Bioinformatics.
Cross-mapping correction software
Modern high-throughput technologies enable deep sequencing of non-coding RNA species, such as miRNAs, on an unprecedented scale. When mapping such small RNAs to the genome, cross-mapping may occur, in which RNA sequences originating from one genomic locus are inadvertently mapped to a different locus. This may give rise to spurious novel RNAs, as well as spurious editing sites in known miRNAs. The cross-mapping correction software is a Python script that aims to correct for such cross-mapping effects.
REQUIREMENTS: Python 2.4; Numerical Python (NumPy) version 1.3 or later.
LICENSE: The Python License.
AVAILABILITY: cmc.tar.gz
SAMPLE DATASET: A sample data set is included with the software package.
CITATION:
De Hoon, M.J.L., et al. (2010): Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries. Genome Research 20: 257-264 (2010).

Software

Mogrify

CIDER

Delve

CAGEr

CAGExploreR

RECLU

Zenbu

Moirai

TomeTools

SAMstat

TagDust

Decomposition-based peak identification

CAGEScan-Clustering

bedtools-pairedBamToBed12

SDRF2GRAPH

Nexalign

EdgeExpressDB (eeDB)

MuMRescueLite

Cross-mapping correction software