----------------------------------------------------- README Taft et al Nature Genetics 2009 Raw sequence files ----------------------------------------------------- The small RNA data underwent two "extractions" - i.e. removal of sequence adaptors/linkers to reveal the sequenced small RNAs. The tiRNA analysis was done with the first extraction. These small RNA sequences can be found in the following file in this directory: 061213-smallRNA-sequence.tar.gz The second extraction of the data can be found here: http://fantom.gsc.riken.jp/4/download/Tables/human/SmallRNA/tag_counts/ This is a more robust extraction, and I encourage people interested in using this small RNA data to use this set. The small RNA data was completed as a time-course. Small RNAs were sequences from unstimulated THP-1 cells (libraries s01 through s05) and at various time-points after stimulation with PMA (libraries s06-s08). Please see the following README file for more information: http://fantom.gsc.riken.jp/4/download/Tables/human/SmallRNA/tag_counts/00_comment_lines.txt For ease of use I have taken all the small RNAs from the second extraction, pooled their counts across all time points, and mapped them to UCSC hg18. I used a lower limit of 15nt and maximum multi-mapping threshold of 100x. The mapped small RNAs (BED) and FASTA files can be found in this directory. FANTOM4_THP1_uniqMapping.bed :: hg18 BED coordinates for all uniquely mapping FANTOM4 THP-1 small RNAs FANTOM4_THP1_2-100x.bed :: hg18 BED coordinates for all small RNAs that map to the genome 2 - 100 x FANTOM4_THP1_rawTags_combinedCounts.fa :: all small RNAs from the second extraction with pooled counts The BED field name, and the FASTA header, have the following format FANTOM4_THP1_numericalID_pooledTagAbundance_tagSize Because the data presented in these files is pooled from multiple libraries (and time points), the tag abundance should only be used as a very relative indicator of expression. For more intensive/detailed analyses I suggest mapping each set of second extraction tags separately (e.g. each time point), and examining the relative proportions in each library. Please contact me if you have any questions. Ryan J Taft Mattick Laboratory Institute for Molecular Bioscience University of Queensland Australia r.taft@imb.uq.edu.au