Re-processing of the data generated by the FANTOM5 project  (galGal6 v1)
===
All the chicken data produced by the FANTOM5 project was originally processed on galGal5. With the recent update of genome assembly and related information, we reprocessed the FANTOM5 data here.

- target genome: galGal6
- inquiries: fantom-help@riken.jp
- original data: http://fantom.gsc.riken.jp/5/datafiles/phase2.6


Updates
---
* Mar 27, 2020  initial release
  * Add CAGE and sRNA mapping data
* Nov 20, 2020  version2
  * Fixed CAGE mapping/CTSS files that was filtering by aln_filter


Data types
---
- CAGE read alignment: the raw HeliScope reads are aligned by delve (http://fantom.gsc.riken.jp/software/). The resulting alignment formatted in ( *.bam ) are indexed ( *.bai )
- CTSS (CAGE tag starting site): 5'-end of the CAGE read alignments with mapping quality above 20 (which is equivalent to single mapped reads) and percent identity 85%  are counted at 1bp resolution. Genomic coordinates are formatted as BED and the counts are described in its score column
- sRNA alignment: the raw reads are aligned by bwa. The resulting alignment formatted in ( *.bam ) are indexed ( *.bai )
- experimental meta data: *sdrf.txt is a tab delimited flat file describing the experimental details for each sample.

Directory and file names
---
Data files are located under the directory names as  <organism_name>.<biological_category>.<technology> 

- Technology is either hCAGE (CAGE sequencing on Heliscope single molecule sequencer), LQhCAGE (Low Quantity hCAGE) or sRNA (sRNA seq). For details on the protocols used, please see [http://fantom.gsc.riken.jp/5/sstar/Protocols].
- The biological category is one of primary_cell, or tissue.
- A part of file name represent the sample name. The sample name is encoded by percent encoding, and concatenated with <internal_library-id> , <RNA_id>, <genome assembly>, <barcode when available>, and data types described wbove.


Reference
---
- FANTOM5 main papers
	* Forrest ARR, et al. A promoter-level mammalian expression atlas. Nature 507: 462窶�470 (2014)
	* Andersson R, et al. An atlas of active enhancers across human cell types and tissues. Nature 507: 455窶�461 (2014)
	* Arner E,  et al. 2015. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science (80- ) 347: 1010窶�1014. http://www.sciencemag.org/cgi/doi/10.1126/science.1259418.
- Data descriptor
	* Abugessaisa I, et al. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies. Sci Data 4: 170107 (2017)
	* Noguchi S, et al. FANTOM5 CAGE profiles of human and mouse samples. Sci Data 4: 170112 (2017)
- FANTOM5 databases / data resource:
	* Lizio M, et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16: 22 (2015)
- HeliScopeCAGE:
	* Kanamori-Katayama M, et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res 21: 1150窶�1159 (2011)
	* Itoh M, Automated workflow for preparation of cDNA for cap analysis of gene expression on a single molecule sequencer. PLoS One 7: e30809 (2012)
- BAM: https://samtools.github.io/hts-specs/SAMv1.pdf
- BED: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
- SDRF: http://isatab.sourceforge.net/format.html