• Delve version 0.95

    A probabilistic short read aligner used in FANTOM5 and ENCODE.
    LICENSE: GNU General Public License
    AVAILABILITY: delve.tgz
  • CAGEr

    CAGEr provides a comprehensive toolbox for analysis and visualization of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites (TSS) and promoterome mining in R. It performs identification of transcription start sites and frequency of their usage from input CAGE sequencing data, normalization of raw CAGE tag count, clustering of TSSs into tag clusters (TC) and their aggregation across multiple CAGE experiments to construct the promoterome. It manipulates multiple CAGE experiments at once, performs expression profiling across experiments both at level of individual TSSs and clusters of TSSs and exports several different types of track files for visualization in the genome browser. Methods for analysis of promoter width and detection of differential usage of TSSs (promoter shifting) between samples are also provided. The package is accompanied by data packages containing FANTOM and ENCODE CAGE data that can be readily used in R, analyzed with provided tools and integrated with other genomic data.
    REQUIREMENTS: R, Bioconductor
    LICENSE: GNU General Public License
    AVAILABILITY: http://bioconductor.org/packages/release/bioc/html/CAGEr.html
  • CAGExploreR

    CAGExploreR is an R package that facilitates the detection and visualization of changes in the relative transcription from promoter regions in multi-promoter genes, all in the context of overall gene expression. Multiple samples can be compared simultaneously. Primarily based on the FANTOM5 promoter set definitions, however other regions such as MPromDb or user-supplied regions can also be used.
    REQUIREMENTS: R version 3.0.2 or later.
    AVAILABILITY: http://cran.r-project.org/web/packages/CAGExploreR/index.html
    SAMPLE DATASET: A sample data set is included with the software package.
    CONTACT: edimont@mail.harvard.edu
    Dimont, E. et al. (2014). CAGExploreR: an R package for the analysis and visualization of promoter dynamics across multiple experiments. Bioinformatics. DOI: 10.1093/bioinformatics/btu125

    RECLU is a reproducible clustering pipeline with multiple scales using capped analysis of gene expression (CAGE). This program discoveries numerous alternative transcription start sites (TSSs) with the biological implication for your sample. This directory contains the standalone program and the package for the Moirai system.
    AVAILABILITY: http://en.sourceforge.jp/projects/reclu/
    CONTACT: hiroko.ohmiya@riken.jp
  • Zenbu

    ZENBU: a data integration, data processing, and visualization web system
    AVAILABILITY: http://sourceforge.net/projects/zenbu/
    Severin et al. (2014) "Interactive visualization and analysis of large-scale NGS data-sets using ZENBU." Nature Biotechnology
    DOI: 10.1038/nbt.2840
  • Moirai

    MOIRAI: A Compact Workflow System for CAGE Analysis
    LICENSE: GNU General Public License
    AVAILABILITY: http://sourceforge.net/projects/moirai/
    BMC Bioinformatics submitted.
  • TomeTools

    A collection of programs to store and manipulate thousands of CAGE datasets.
    LICENSE: GNU General Public License
    AVAILABILITY: http://tometools.sourceforge.net
  • SAMstat

    Displaying sequence statistics for next generation sequencing
    LICENSE: GNU General Public License
    AVAILABILITY: http://samstat.sourceforge.net
    Lassmann et al. (2010) "SAMStat: monitoring biases in next generation sequencing data." Bioinformatics
    DOI: 10.1093/bioinformatics/btq614; PMID: 21088025
  • TagDust

    A program to eliminate artifacts from next generation sequencing data.
    LICENSE: GNU General Public License
    Lassmann et al. (2009) "TagDust--a program to eliminate artifacts from next generation sequencing data.” Bioinformatics. 2009 Nov 1;25(21):2839-40.
    DOI: 10.1093/bioinformatics/btp527; PMID: 19737799
  • Decomposition-based peak identification

    Decomposition-based peak identification, which find peaks across a large number of TSS (transcription starting site) profiles.
    REQUIREMENTS: R, fastICA package, bigWigToBedGraph in jksrc.zip, bedTools
    LICENSE: GNU General Public License
    AVAILABILITY: https://github.com/hkawaji/dpi1/
    A promoter level mammalian expression atlas, Forrest A, Kawaji H, Rehli M, et al. (submitted)
  • CAGEScan-Clustering

    CAGEScan-Clustering creates transcript assemblies from CAGEScan derived Transcription Start Site (TSS) associated reads paired with randomly primed readsm, grouping them on the basis of the common location of TSS reads. Assembly-seeding TSS read clusters can either be provided as an external BED file or computed from the CAGEscan data itself. See `CAGEScan-Clustering.pl --help` for details.
    REQUIREMENTS: Unix / Linux. Perl v5.10.1 or higher. BedTools v2.9.0 or higher. optionally samtools version: 0.1.7 (r510) or higher
    LICENSE: GNU General Public License
    AVAILABILITY: https://github.com/nicolas-bertin/CAGEscan-Clustering
    CONTACT: nbertin@gsc.riken.jp
    Kratz et al. in preparation (note, the citation may be revised upon acceptance of the manuscript)
  • bedtools-pairedBamToBed12

    Addition of a pairedBamToBed12 utility by Nicolas Bertin (OSC RIKEN Yokohama) to BEDTools Created by Aaron Quinlan Spring 2009.
    AVAILABILITY: https://github.com/nicolas-bertin/bedtools-pairedBamToBed12
    CONTACT: nbertin@gsc.riken.jp

    SDRF2GRAPH is an application to produce graphical image of investigation design graph (IDG) based on SDRFs written in a MAGE-tab formatted spreadsheet(*.xlsx).
    REQUIREMENTS: Ruby, rexml, rubyzip, GraphViz.
    LICENSE: Ruby's license
    AVAILABILITY: http://fantom.gsc.riken.jp/4/sdrf2graph
    Hideya Kawaji et al., "SDRF2GRAPH - a visualization tool of a spreadsheet-based description of experimental processes". BMC Bioinformatics 2009, 10:133
  • Nexalign

    Nexalign is a program to align millions of short reads from next-generation sequencing data sets to reference genomes.
    REQUIREMENTS: Unix / Linux.
    LICENSE: GNU General Public License
    AVAILABILITY: nexalign-1.3.5.tgz
    CONTACT: timolassmann@gmail.com
  • EdgeExpressDB (eeDB)

    EdgeExpressDB (eeDB) is a federated data abstraction system designed for integrating, interpreting, and visualizing very large biology datasets. It is designed for scaling beyond Petabytes and 10^13 objects. For those interested in installing your own instances of EEDB the source code is available via CPAN and is being further developed within the Omics Science Center by Jessica Severin.
    LICENSE: BSD License
    AVAILABILITY: http://sourceforge.net/projects/eedb/, available via CPAN (http://search.cpan.org/~jms/EdgeExpressDB_0.953h/).
    Jessica Severin, et.al. FANTOM4 EdgeExpressDB: an integrated database of genes, microRNAs, their promoters, expression dynamics and regulatory interactions. Genome Biology, 10:R39, 1-9 (2009)
  • MuMRescueLite

    MuMRescueLite is the software that enable to use the tag sequencies of mapped to multiple loci to the genome, for the expression analysis. At the mapping of short sequence tags of CAGE or ChIP-Seq to the genome, sequence tags that map to multiple genomic loci (multi-mapping tags or MuMs), are routinely omitted from further analysis, leading to experimental bias and reduced coverage. MuMRescueLite probabilistically reincorporates multi-mapping tags into mapped short read data with acceptable computational requirements.
    REQUIREMENTS: Python2.4 or later; platform is same to the Python itself.
    LICENSE: The MIT License.
    AVAILABILITY: MuMRescueLite_090522.tar.gz
    SAMPLE DATASET: MuMRescueLite_test_data.tsv.gz
    Faulkner, G.J., et al. (2008) A rescue strategy for multi-mapping short sequence tags refines surveys of transcriptional activity by CAGE, Genomics.
    Hashimoto, T., et al. (2009) Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite, Bioinformatics.
  • Cross-mapping correction software

    Modern high-throughput technologies enable deep sequencing of non-coding RNA species, such as miRNAs, on an unprecedented scale. When mapping such small RNAs to the genome, cross-mapping may occur, in which RNA sequences originating from one genomic locus are inadvertently mapped to a different locus. This may give rise to spurious novel RNAs, as well as spurious editing sites in known miRNAs. The cross-mapping correction software is a Python script that aims to correct for such cross-mapping effects.
    REQUIREMENTS: Python 2.4; Numerical Python (NumPy) version 1.3 or later.
    LICENSE: The Python License.
    AVAILABILITY: cmc.tar.gz
    SAMPLE DATASET: A sample data set is included with the software package.
    De Hoon, M.J.L., et al. (2010): Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries. Genome Research 20: 257-264 (2010).