Software name | Reference | Description |
FANTOM cDNA annotation system (CAS) | T. Kasukawa et al. in preparation | web-based system for human curation of sequences |
ITOP | T. Kasukawa et al. in preparation | displays seqencing quality (PHRED) scores |
Homology Viewer | M. Furuno et al. in preparation | Graphical viewer that shows homologous regions to protein sequences and start/stop condons for each frame |
ClusTrans | J. Adachi et al. in preparation | RIKEN cDNA sequence clustering, viewer, and editor |
READ | Bono et al. Nucleic Acids Res. 30, 211-213. (2002) | RIKEN expression array database |
Metabolomapper | H. Bono et al. in preparation | system to browse and map assigned EC numbers ot KEGG metabolic pathways |
FACTS | T. Nagashima et al. in preparation | system to explore and curate computational higher functional annotations (protein interactions and disease assocations) of cDNA clones using text sources |
Software name | Reference | Description | |
Database searching | |||
NCBI-BLAST | Altschul et al. J. Mol. Biol. 215, 403-410. (1990) | Basic Local Alignment SearchTool that includes s a set of similarity search programs(BLASTN, BLASTP,BLASTX, TBLASTN, TBLASTX) | |
RepeatMasker | Smit, A.F.A. and Green, P. unpublished results | screens DNA sequences againsta library of repetitive elements, as well as for low complexity regions;it returns a masked query sequence ready for database searches | |
Protein Sequence Analysis | |||
FASTY | Pearson et al, Genomics 46, 24-36. (1997) | FASTY is a program of the FASTA package that compares a DNA sequence to a protein sequence database using the FASTA algorithm; it translates the DNA sequencein three forward (or reverse) frames and allows frameshifts) | |
HMMER | Eddy. Bioinformatics 14, 755-763. (1998) | profile hidden Markov modelsfor biological sequence analysis; searches a sequence database with a profileHMM or builds a hidden Markov model from an sequence alignment | |
InterProScan | Zdobnov and Apweiler. Bioinformatics 17, 847-848. (2001) | SW-based InterPro motif search | |
iPSORT | Bannai et al. Bioinformatics 18, 298-305. (2002) | Predicts the subcellular location of proteins | |
TMHMM | A. Krogh et al. J. Mol. Biol. 305, 567-580. (2001) | Prediction of transmembrane helices in proteins | |
COILS | A. Lupas et al. Science 252, 1162-1164. (1991) | Prediction of coiled-coil conformation from protein sequences | |
SignalP | H. Nielsen el al. Proc Int Conf Intell Syst Mol Biol 6, 122-130. (1998) | Prediction of the presence and location of signal peptide cleavage sites in amino acid sequences | |
Gene structure; Open Reading Frame | |||
DECODER (in house) | Fukunishi and Hayashizaki, Physiological genomics 5, 81-87. (2001) | extracts open reading frames from sequences and corrects frame-shifts | |
rsCDS (in house) | M. Furuno et al. in preration | CDS prediction completely based on homology search of protein sequences | |
ProCrest (in house) | J. Adachi et al. in preparation | CDS prediction based on coding potential in DNA sequences | |
NCBI CDS Predictor (in house) | L. Wagner, (unpublished) | CDS prediction based on both homology proteins and coding potential | |
Sequence assembly, clustering, Gene Index building | |||
Phred | Ewing and Green. Genome Res. 8, 186-194. (1998) | reads DNA sequencer tracedata, calls bases, and assigns quality values to the bases | |
Phrap | assembles shotgun DNA sequencedata to a contig sequence | ||
Consed | D. Gordon et al. Genome Res. 8, 195-202. (1998) | edits sequence assembliescreated by Phrap for reassembling of the same data set | |
CAP3 | X. Huang et al. Genome Res. 9, 868-877. (1999) | assembles sequences using base quality values in computation of overlaps between reads; construction of multiple sequence alignments of reads, and generation of consensus sequences; integrated in the TIGR Gene Index assembly pipline | |
Megablast | nucleotide sequence alignment search program, used for clustering in the TIGR Gene Index assembly | ||
TGI assemby pipeline | J. Quackenbush et al. Nucleic Acids Res. 29, 159-164. (2001) | TIGR Gene Index assembly pipline | |
Mapping and genomic alignments | |||
TGI mapping pipeline | genomic alignment and groupingof tentative transcript sequences | ||
blEST | L. Florea et al. Genome Res. 8, 967-974. (1998) | cDNA-genome alignment program integrated in TIGR Gene Index genomic mapping pipeline | |
SIM4 | L. Florea et al. Genome Res. 8, 967-974. (1998) | aligns a cDNA sequence to a genomic sequence under the assumption that the differences between the two sequences are limited to introns in the genomic sequence and sequencing errors in either of the sequences | |
Gene Ontology Browser | |||
GO around | J. Tanoue et al. Bioinformatics (in press) | Gene ontology viewer | |
Database | Reference | Description |
Nucleotide sequence | ||
DDBJ | Tateno et al. Nuecleic Acids Res. 30, 27-30. (2002) | all known nucleotide and protein sequences |
EMBL | Stoesser et al . Nucleic Acids Res. 30, 21-26. (2002) | all known nucleotide and protein sequences |
GenBank | Benson et al. Nucleic Acids Res. 30, 17-20. (2002) | all known nucleotide and protein sequences |
Mouse Genome Informatics (MGI) - Mouse Genome Database (MGD) | Blake et al. Nucleic Acids Res. 30, 113-115. (2002) | model organsim database for the laboratory mouse; gene, sequence, nomenclature, GO information among others |
RefSeq/LocusLink | Pruitt et al. Nucleic Acids Res. 29, 137-140. (2001) | non-redundant collection of genes and reference reference sequence standards |
dbEST(mouse division) | mouse EST sequences | |
UniGene | Wheeler et al. Nucleic Acids Res. 30:13-16, 2002 | clusters of ESTs and full-length mRNA sequences; each cluster; represent a unique known or putative gene |
TIGR Gene Indices | J. Quackenbush et al. Nucleic Acids Res. 29, 159-164. (2001) | TIGR and GenBank EST sequences assembled to tentative consensus sequences |
nt(NCBI) | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | all GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant". |
Alternative splicing dB | Zavolan et al. manusript in preparation | Database of alternatively spliced mouse transcripts |
Mapping | ||
MGSC v3 | Mouse Genome Sequencing Consortium. Nature. (this issue) (2002) | mouse genome sequence assembly |
Human "Golden Path" | International Human Genome Sequencing Consortium, Nature 409, 860-921. (2001) | human genome sequence assembly |
Ensembl | Hubbard et al. Nucleic Acids Res. 30, 38-41. (2002) | genome dataset containing confirmed and predicted genes, exons, transcripts, and contigs |
Riken-GenoMapper M. musculus cDNA mapping | H. Kiyosawa et al. in preparation | RIKEN clones mapped to mouse genome incl. information disease, public mouse genes, markers and ESTs |
Riken-GenoMapper H. sapiens cDNA mapping | H. Kiyosawa et al. in preparation | RIKEN clones mapped to human genome incl. information disease, public mouse genes, markers and ESTs |
Radiation Hybrid Map | I. Yamanaka et al. J. Struct. Func. Genomics 2, 23-28. (2002) | RIKEN clones mapped to mouse chromosomes based on sequence homology to ESTs of Whitehead mouse T31 radiation hybrid map |
Protein sequence | ||
nr(NCBI) | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF |
SPTR (SwissProt + TrEMBL non-redundant protein set) | Bairoch et al. Nucleic Acids Res. 28, 45-48. (2000) | annotated protein databasewith minimum redunandancy, annotation incl. GO terms and functional sites |
PIR NREF | Wu et al. Nucleic Acids Res. 30, 35-37. (2002) | non-redundant reference protein database that includes all sequences from PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB |
Domains, motifs and superfamilies | ||
SCOP | Lo Conte et al. Nucleic Acids Res. 30, 264-267. (2002) | structural classification of proteins |
SUPERFAMILY | Gough et al., Nucleic Acids Res. 30, 268-272. (2002) | |
Pfam | Bateman et al. Nucleic Acids Res. 30, 276-280. (2002) | semi-automatic protein familydatabase containing multiple protein alignments and profile-HMMs of thesefamilies |
MDS | Kawaji et al. Genome Res. 12, 367-378. (2002) | novel motifs extracted from SPTR and FANTOM DB |
InterPro | Apweiler et al. Nucleic Acids Res. 29, 37-40. (2001) | integrated view of otherdomain and functional site databases (PROSITE, PRINTS, ProDom and Pfam) |
UTRsite and UTRdb | Pesole et al. Nucleic Acids Res. 30, 335-340. (2002) | UTRsite: nucleotide sequence patterns of UTRs where a functional role has been shown epxerimentally; UTRdB a non-redundant 3' and 5'UTRsequences of eukaryotic mRNAs enriched with annotations abouts functional elements and repeats |
Pathway | ||
KEGG | Kanehisa et al. Nucleic Acids Res. 30, 42-46. (2002) | metabolic and regulatory pathway maps |
Disease | ||
OMIM | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | catalog of human genes and genetic disorders |
Literature | ||
PubMed | abstracts and bibliographicinformation of journal articles and books | |
Gene Onotology | ||
GO database | Ashburner et al. Nat Genet. 25, 25-29. (2000) | gene ontology terms |
SNP | ||
dbSNP | Wheeler et al. Nucleic Acids Res. 30, 13-16. (2002) | single nucletoide polymorphism |