How to BLAST your sequence(s) against the RIKEN FANTOM data?
You can BLAST against the FANTOM 21,076 sequences at our web site;
http://genome.gsc.riken.go.jp/resource.html
BLAST search against the FANTOM 21,076 sequences is available from "homology search service" in this page.
Are RIKEN all sequences available in DDBJ/EMBL/GenBank?
FANTOM cDNA sequences have been registered in the public sequence database DDBJ, exept for 1,908 cDNAs sequences assembled using sequences from public expressed sequence tag (EST) databases. 19,168 sequences have been available at DDBJ/EMBL/GenBank (see DDBJ accession numbers to FANTOM sequences).
All 21,076 sequences, including those assembled using EST database, are available at our web site (http://genome.gsc.riken.go.jp/resource.html).
Where each Riken clone were prepared from?
See, library information.
Can we get all data of FANTOM?
You can download archive files of following data at resource page (http://genome.gsc.riken.go.jp/resource.html) in Genome Exploration Research Group web site.
- FAMTOM Full-length sequence data files
- FANTOM Annotation data files
- MaXML (Mouse annotation XML) files
- Computational amino acid sequences (10,465) of FANTOM clones
Are amino acid sequences registered in DDBJ/EMBL/Genbank identical to those used in FANTOM analysis?
No, those were calclulated by different ways.
The amino acid sequences prepared for analisis of FANTOM were predicted by RIKEN DECORDER program (Fukunishi, Y. & Hayashizaki, Y. Amino-acid translation program for full-length cDNA sequences with
frame-shift error. Physiol. Genomics. (in the press)).
DECODER is an amino- acid translation program designed to suggest the position of experimental frame-shift
errors, and predict amino-acid sequences for full-length cDNA sequences with PHRED
scores. The program generates artiNcial insertions into and artiNcial deletions from the
low-accuracy base positions of the original sequence, thereby generating many candidate
sequences. The validity of the most probable sequence (the likelihood that it represents the
actual protein) is evaluated by using a score (Va) that is calculated in light of the Kozak
consensus, preferred codon usage and position of the initiation codon.
The concept of amino acid sequence registered to DDBJ/EMBL/GenBank was as follows;
- The predicted length >100aa
- Even if predicted length was less than 100aa, the predicted amino acid sequences was matched with whole ORF in nucleotide sequence or whole amino acid sequence of protein which are already registered in public database.