Software
CAGEScan-Clustering
DESCRIPTION: CAGEScan-Clustering creates transcript assemblies from CAGEScan derived Transcription Start Site (TSS) associated reads paired with randomly primed readsm, grouping them on the basis of the common location of TSS reads. Assembly-seeding TSS read clusters can either be provided as an external BED file or computed from the CAGEscan data itself. See `CAGEScan-Clustering.pl --help` for details.
REQUIREMENTS: Unix / Linux. Perl v5.10.1 or higher. BedTools v2.9.0 or higher. optionally samtools version: 0.1.7 (r510) or higher
LICENSE: GNU General Public License
CITATION:Kratz et al. in preparation (note, the citation may be revised upon acceptance of the manuscript)
AVAILABILITY: https://github.com/nicolas-bertin/CAGEscan-Clustering
CONTACT: nbertin@gsc.riken.jpbedtools-pairedBamToBed12
DESCRIPTION: Addition of a pairedBamToBed12 utility by Nicolas Bertin (OSC RIKEN Yokohama) to BEDTools Created by Aaron Quinlan Spring 2009.
AVAILABILITY: https://github.com/nicolas-bertin/bedtools-pairedBamToBed12
CONTACT: nbertin@gsc.riken.jpSDRF2GRAPH
DESCRIPTION: SDRF2GRAPH is an application to produce graphical image of investigation design graph (IDG) based on SDRFs written in a MAGE-tab formatted spreadsheet(*.xlsx).
REQUIREMENTS: Ruby, rexml, rubyzip, GraphViz.
LICENSE: Ruby's license
CITATION:Hideya Kawaji et al., "SDRF2GRAPH - a visualization tool of a spreadsheet-based description of experimental processes". BMC Bioinformatics 2009, 10:133
AVAILABILITY: http://fantom.gsc.riken.jp/4/sdrf2graphNexalign
DESCRIPTION: Nexalign is a program to align millions of short reads from next-generation sequencing data sets to reference genomes.
REQUIREMENTS: Unix / Linux.
LICENSE: GNU General Public License
AVAILABILITY: nexalign-1.3.5.tgz
CONTACT: timolassmann@gmail.comTagDust
DESCRIPTION: TagDust is a program to eliminate artifactual reads from next-generation sequencing data sets.
REQUIREMENTS: Unix / Linux.
LICENSE: GNU General Public License
CITATION:Lassmann T., et al. (2009) TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics.
AVAILABILITY: tagdust.tgz
CONTACT: timolassmann@gmail.comEdgeExpressDB (eeDB)
DESCRIPTION: EdgeExpressDB (eeDB) is a federated data abstraction system designed for integrating, interpreting, and visualizing very large biology datasets. It is designed for scaling beyond Petabytes and 10^13 objects. For those interested in installing your own instances of EEDB the source code is available via CPAN and is being further developed within the Omics Science Center by Jessica Severin.
REQUIREMENTS: Perl DBI/DBD, MySQL, SQLite.
LICENSE: BSD License
CITATION:Jessica Severin, et.al. FANTOM4 EdgeExpressDB: an integrated database of genes, microRNAs, their promoters, expression dynamics and regulatory interactions. Genome Biology, 10:R39, 1-9 (2009)
AVAILABILITY: http://sourceforge.net/projects/eedb/, available via CPAN (http://search.cpan.org/~jms/EdgeExpressDB_0.953h/).MuMRescueLite
DESCRIPTION: MuMRescueLite is the software that enable to use the tag sequencies of mapped to multiple loci to the genome, for the expression analysis. At the mapping of short sequence tags of CAGE or ChIP-Seq to the genome, sequence tags that map to multiple genomic loci (multi-mapping tags or MuMs), are routinely omitted from further analysis, leading to experimental bias and reduced coverage. MuMRescueLite probabilistically reincorporates multi-mapping tags into mapped short read data with acceptable computational requirements.
REQUIREMENTS: Python2.4 or later; platform is same to the Python itself.
LICENSE: The MIT License.
CITATION:Faulkner, G.J., et al. (2008) A rescue strategy for multi-mapping short sequence tags refines surveys of transcriptional activity by CAGE, Genomics.
AVAILABILITY: MuMRescueLite_090522.tar.gz
Hashimoto, T., et al. (2009) Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite, Bioinformatics.
SAMPLE DATASET: MuMRescueLite_test_data.tsv.gzCross-mapping correction software
DESCRIPTION: Modern high-throughput technologies enable deep sequencing of non-coding RNA species, such as miRNAs, on an unprecedented scale. When mapping such small RNAs to the genome, cross-mapping may occur, in which RNA sequences originating from one genomic locus are inadvertently mapped to a different locus. This may give rise to spurious novel RNAs, as well as spurious editing sites in known miRNAs. The cross-mapping correction software is a Python script that aims to correct for such cross-mapping effects.
REQUIREMENTS: Python 2.4; Numerical Python (NumPy) version 1.3 or later.
LICENSE: The Python License.
CITATION:De Hoon, M.J.L., et al. (2010): Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries. Genome Research 20: 257-264 (2010).
AVAILABILITY: cmc.tar.gz
SAMPLE DATASET: A sample data set is included with the software package.