2017-01-26 Marina Lizio (marina.lizio@riken.jp) Inquiries to fantom-help@gsc.riken.jp HeliscopeCAGE and Illumina sequencing, mapping, CTSS aggregation. This folder contains all the snapshots and time course primary data generated by the FANTOM5 project. Files are arranged in sub-folders whose names follow a simple scheme of .. . Technology is one of hCAGE (CAGE sequencing on Heliscope single molecule sequencer), LQhCAGE (Low Quantity hCAGE), or CAGEscan (paired-end CAGE). For details on the protocols used, please see [http://fantom.gsc.riken.jp/sstar/Protocols]. The biological category is one of primary_cell, cell line, timecourse, fractionation or tissue. Within each of these sub-folders, for each sample, the following types of files are provided in the case of hCAGE: 00_*.assay_sdrf.txt is a tab delimited flat file describing the experimental details for each sample. *.bam is the indexed mapping file including the whole alignments *.bam.bai is the corresponding index file of the bam file *.ctss.bed.gz represents a CAGE TSS file. It is obtained by converting BAM alignments into BED and aggregating the resulting sequences in CAGE tags. In the conversion, only those sequence tags with alignment quality score above 20 are retained. *.rdna.fa.gz is a FASTA format file including all the sequences aligning to ribosomal DNA. In the case of CAGEscan, the following files are provided: 00_*.assay_sdrf.txt is a tab delimited flat file describing the experimental details for each sample. *.bam is the indexed mapping file including the whole alignments *.bam.bai is the corresponding index file of the bam file *.3prime.fq.gz sequences in fastQ format of the 3' end of the CAGEscan tag *.5prime.fq.gz sequences in fastQ format of the 5' end of the CAGEscan tag *.clusters.bed.gz is a bed file listing clustered CAGEscan mapped pairs *.pairs.bed.gz bed12 file format of the CAGEscan mapped pairs We have chosen the file name scheme carefully to provide as much information as we could for the samples. The structure follows a scheme where ..... is used.