| Software name | Reference | Description |
| FANTOM cDNA annotation system (CAS) | T. Kasukawa et al. in preparation | web-based system for human curation of sequences |
| ITOP | T. Kasukawa et al. in preparation | displays seqencing quality (PHRED) scores |
| Homology Viewer | M. Furuno et al. in preparation | Graphical viewer that shows homologous regions to protein sequences and start/stop condons for each frame |
| ClusTrans | J. Adachi et al. in preparation | RIKEN cDNA sequence clustering, viewer, and editor |
| READ | Bono et al. Nucleic Acids Res. 30, 211-213. (2002) | RIKEN expression array database |
| Metabolomapper | H. Bono et al. in preparation | system to browse and map assigned EC numbers ot KEGG metabolic pathways |
| FACTS | T. Nagashima et al. in preparation | system to explore and curate computational higher functional annotations (protein interactions and disease assocations) of cDNA clones using text sources |
| Software name | Reference | Description | |
| Database searching | |||
| NCBI-BLAST | Altschul et al. J. Mol. Biol. 215, 403-410. (1990) | Basic Local Alignment SearchTool that includes s a set of similarity search programs(BLASTN, BLASTP,BLASTX, TBLASTN, TBLASTX) | |
| RepeatMasker | Smit, A.F.A. and Green, P. unpublished results | screens DNA sequences againsta library of repetitive elements, as well as for low complexity regions;it returns a masked query sequence ready for database searches | |
| Protein Sequence Analysis | |||
| FASTY | Pearson et al, Genomics 46, 24-36. (1997) | FASTY is a program of the FASTA package that compares a DNA sequence to a protein sequence database using the FASTA algorithm; it translates the DNA sequencein three forward (or reverse) frames and allows frameshifts) | |
| HMMER | Eddy. Bioinformatics 14, 755-763. (1998) | profile hidden Markov modelsfor biological sequence analysis; searches a sequence database with a profileHMM or builds a hidden Markov model from an sequence alignment | |
| InterProScan | Zdobnov and Apweiler. Bioinformatics 17, 847-848. (2001) | SW-based InterPro motif search | |
| iPSORT | Bannai et al. Bioinformatics 18, 298-305. (2002) | Predicts the subcellular location of proteins | |
| TMHMM | A. Krogh et al. J. Mol. Biol. 305, 567-580. (2001) | Prediction of transmembrane helices in proteins | |
| COILS | A. Lupas et al. Science 252, 1162-1164. (1991) | Prediction of coiled-coil conformation from protein sequences | |
| SignalP | H. Nielsen el al. Proc Int Conf Intell Syst Mol Biol 6, 122-130. (1998) | Prediction of the presence and location of signal peptide cleavage sites in amino acid sequences | |
| Gene structure; Open Reading Frame | |||
| DECODER (in house) | Fukunishi and Hayashizaki, Physiological genomics 5, 81-87. (2001) | extracts open reading frames from sequences and corrects frame-shifts | |
| rsCDS (in house) | M. Furuno et al. in preration | CDS prediction completely based on homology search of protein sequences | |
| ProCrest (in house) | J. Adachi et al. in preparation | CDS prediction based on coding potential in DNA sequences | |
| NCBI CDS Predictor (in house) | L. Wagner, (unpublished) | CDS prediction based on both homology proteins and coding potential | |
| Sequence assembly, clustering, Gene Index building | |||
| Phred | Ewing and Green. Genome Res. 8, 186-194. (1998) | reads DNA sequencer tracedata, calls bases, and assigns quality values to the bases | |
| Phrap | assembles shotgun DNA sequencedata to a contig sequence | ||
| Consed | D. Gordon et al. Genome Res. 8, 195-202. (1998) | edits sequence assembliescreated by Phrap for reassembling of the same data set | |
| CAP3 | X. Huang et al. Genome Res. 9, 868-877. (1999) | assembles sequences using base quality values in computation of overlaps between reads; construction of multiple sequence alignments of reads, and generation of consensus sequences; integrated in the TIGR Gene Index assembly pipline | |
| Megablast | nucleotide sequence alignment search program, used for clustering in the TIGR Gene Index assembly | ||
| TGI assemby pipeline | J. Quackenbush et al. Nucleic Acids Res. 29, 159-164. (2001) | TIGR Gene Index assembly pipline | |
| Mapping and genomic alignments | |||
| TGI mapping pipeline | genomic alignment and groupingof tentative transcript sequences | ||
| blEST | L. Florea et al. Genome Res. 8, 967-974. (1998) | cDNA-genome alignment program integrated in TIGR Gene Index genomic mapping pipeline | |
| SIM4 | L. Florea et al. Genome Res. 8, 967-974. (1998) | aligns a cDNA sequence to a genomic sequence under the assumption that the differences between the two sequences are limited to introns in the genomic sequence and sequencing errors in either of the sequences | |
| Gene Ontology Browser | |||
| GO around | J. Tanoue et al. Bioinformatics (in press) | Gene ontology viewer | |
| Database | Reference | Description |
| Nucleotide sequence | ||
| DDBJ | Tateno et al. Nuecleic Acids Res. 30, 27-30. (2002) | all known nucleotide and protein sequences |
| EMBL | Stoesser et al . Nucleic Acids Res. 30, 21-26. (2002) | all known nucleotide and protein sequences |
| GenBank | Benson et al. Nucleic Acids Res. 30, 17-20. (2002) | all known nucleotide and protein sequences |
| Mouse Genome Informatics (MGI) - Mouse Genome Database (MGD) | Blake et al. Nucleic Acids Res. 30, 113-115. (2002) | model organsim database for the laboratory mouse; gene, sequence, nomenclature, GO information among others |
| RefSeq/LocusLink | Pruitt et al. Nucleic Acids Res. 29, 137-140. (2001) | non-redundant collection of genes and reference reference sequence standards |
| dbEST(mouse division) | mouse EST sequences | |
| UniGene | Wheeler et al. Nucleic Acids Res. 30:13-16, 2002 | clusters of ESTs and full-length mRNA sequences; each cluster; represent a unique known or putative gene |
| TIGR Gene Indices | J. Quackenbush et al. Nucleic Acids Res. 29, 159-164. (2001) | TIGR and GenBank EST sequences assembled to tentative consensus sequences |
| nt(NCBI) | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | all GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant". |
| Alternative splicing dB | Zavolan et al. manusript in preparation | Database of alternatively spliced mouse transcripts |
| Mapping | ||
| MGSC v3 | Mouse Genome Sequencing Consortium. Nature. (this issue) (2002) | mouse genome sequence assembly |
| Human "Golden Path" | International Human Genome Sequencing Consortium, Nature 409, 860-921. (2001) | human genome sequence assembly |
| Ensembl | Hubbard et al. Nucleic Acids Res. 30, 38-41. (2002) | genome dataset containing confirmed and predicted genes, exons, transcripts, and contigs |
| Riken-GenoMapper M. musculus cDNA mapping | H. Kiyosawa et al. in preparation | RIKEN clones mapped to mouse genome incl. information disease, public mouse genes, markers and ESTs |
| Riken-GenoMapper H. sapiens cDNA mapping | H. Kiyosawa et al. in preparation | RIKEN clones mapped to human genome incl. information disease, public mouse genes, markers and ESTs |
| Radiation Hybrid Map | I. Yamanaka et al. J. Struct. Func. Genomics 2, 23-28. (2002) | RIKEN clones mapped to mouse chromosomes based on sequence homology to ESTs of Whitehead mouse T31 radiation hybrid map |
| Protein sequence | ||
| nr(NCBI) | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF |
| SPTR (SwissProt + TrEMBL non-redundant protein set) | Bairoch et al. Nucleic Acids Res. 28, 45-48. (2000) | annotated protein databasewith minimum redunandancy, annotation incl. GO terms and functional sites |
| PIR NREF | Wu et al. Nucleic Acids Res. 30, 35-37. (2002) | non-redundant reference protein database that includes all sequences from PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB |
| Domains, motifs and superfamilies | ||
| SCOP | Lo Conte et al. Nucleic Acids Res. 30, 264-267. (2002) | structural classification of proteins |
| SUPERFAMILY | Gough et al., Nucleic Acids Res. 30, 268-272. (2002) | |
| Pfam | Bateman et al. Nucleic Acids Res. 30, 276-280. (2002) | semi-automatic protein familydatabase containing multiple protein alignments and profile-HMMs of thesefamilies |
| MDS | Kawaji et al. Genome Res. 12, 367-378. (2002) | novel motifs extracted from SPTR and FANTOM DB |
| InterPro | Apweiler et al. Nucleic Acids Res. 29, 37-40. (2001) | integrated view of otherdomain and functional site databases (PROSITE, PRINTS, ProDom and Pfam) |
| UTRsite and UTRdb | Pesole et al. Nucleic Acids Res. 30, 335-340. (2002) | UTRsite: nucleotide sequence patterns of UTRs where a functional role has been shown epxerimentally; UTRdB a non-redundant 3' and 5'UTRsequences of eukaryotic mRNAs enriched with annotations abouts functional elements and repeats |
| Pathway | ||
| KEGG | Kanehisa et al. Nucleic Acids Res. 30, 42-46. (2002) | metabolic and regulatory pathway maps |
| Disease | ||
| OMIM | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | catalog of human genes and genetic disorders |
| Literature | ||
| PubMed | abstracts and bibliographicinformation of journal articles and books | |
| Gene Onotology | ||
| GO database | Ashburner et al. Nat Genet. 25, 25-29. (2000) | gene ontology terms |
| SNP | ||
| dbSNP | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | single nucletoide polymorphism |