2019-12-20 Marina Lizio (marina.lizio@riken.jp) Inquiries to fantom-help@riken.jp HeliscopeCAGE and Illumina sequencing, mapping, CTSS aggregation. This folder contains all the snapshots and time course primary data generated by the FANTOM5 project. Files are arranged in sub-folders whose names follow a simple scheme of .. . Technology is one of hCAGE (CAGE sequencing on Heliscope single molecule sequencer), LQhCAGE (Low Quantity hCAGE), CAGEscan (paired-end CAGE), RNA-seq or sRNA (short RNA sequencing). For details on the protocols used, please see [http://fantom.gsc.riken.jp/5/sstar/Protocols]. The biological category is one of primary_cell, cell line, timecourse, fractionation or tissue. Within each of these sub-folders, for each sample, the following types of files are provided in the case of hCAGE: 00_*.assay_sdrf.txt is a tab delimited flat file describing the experimental details for each sample. *.bam is the indexed mapping file including the whole alignments *.bam.bai is the corresponding index file of the bam file *.ctss.bed.gz represents a CAGE TSS file. It is obtained by converting BAM alignments into BED and aggregating the resulting sequences in CAGE tags. In the conversion, only those sequence tags with alignment quality score above 20 are retained. *.rdna.fa.gz is a FASTA format file including all the sequences aligning to ribosomal DNA. In the case of CAGEscan, the following files are provided: 00_*.assay_sdrf.txt is a tab delimited flat file describing the experimental details for each sample. *.bam is the indexed mapping file including the whole alignments obtained using BWA *.bam.bai is the corresponding index file of the bam file *.3prime.fq.gz sequences in fastQ format of the 3' end of the CAGEscan tag *.5prime.fq.gz sequences in fastQ format of the 5' end of the CAGEscan tag *.clusters.bed.gz is the file with CAGEscan clusters in a standard BED12 format where column 4 (name) indicates the name of the seed CAGE peak and column 5 (score) indicates the number of pairs used to build the cluster. *.pairs.bed.gz bed12 file format of the CAGEscan mapped pairs in a standard BED12 format where column 4 (name) indicates the sequencing name read pair and column 5 (score) indicates the sum of the mapping quality of the two reads. More information on the BED12 format can be on the UCSC genome browser website at More information on CAGE scan can be found in Plessy et al., 2010 (https://pubmed.gov/20543846), Kratz et al., 2014 (https://pubmed.gov/24904046) and Bertin et al., 2017 (https://pubmed.gov/28972578). We have chosen the file name scheme carefully to provide as much information as we could for the samples. The structure follows a scheme where ..... is used.