Software
name |
Reference |
Description |
FANTOM cDNA annotation system (CAS) |
T. Kasukawa et al. in
preparation |
web-based system for human
curation of sequences |
ITOP |
T. Kasukawa et al. in
preparation |
displays seqencing quality
(PHRED) scores |
Homology Viewer |
M. Furuno et al. in preparation |
Graphical viewer that shows
homologous regions to protein sequences and start/stop condons for each frame |
ClusTrans |
J. Adachi et al. in preparation |
RIKEN cDNA sequence clustering,
viewer, and editor |
READ |
Bono et al. Nucleic Acids Res. 30,
211-213. (2002) |
RIKEN expression array database |
Metabolomapper |
H. Bono et al. in preparation |
system to browse and map
assigned EC numbers ot KEGG metabolic pathways |
FACTS |
T. Nagashima et al. in
preparation |
system
to explore and curate computational higher functional annotations (protein
interactions and disease assocations) of cDNA clones using text sources |
Software
name |
Reference |
Description |
Database
searching |
NCBI-BLAST |
Altschul et al. J. Mol. Biol. 215,
403-410. (1990) |
Basic Local Alignment SearchTool
that includes s a set of similarity search programs(BLASTN, BLASTP,BLASTX,
TBLASTN, TBLASTX) |
RepeatMasker |
Smit, A.F.A.
and Green, P. unpublished results |
screens DNA sequences againsta
library of repetitive elements, as well as for low complexity regions;it
returns a masked query sequence ready for database searches |
Protein
Sequence Analysis |
|
|
FASTY |
Pearson et al, Genomics 46, 24-36.
(1997) |
FASTY is a program of the FASTA
package that compares a DNA sequence to a protein sequence database using the
FASTA algorithm; it translates the DNA sequencein three forward (or reverse)
frames and allows frameshifts) |
HMMER |
Eddy. Bioinformatics 14, 755-763.
(1998) |
profile hidden Markov modelsfor
biological sequence analysis; searches a sequence database with a profileHMM
or builds a hidden Markov model from an sequence alignment |
InterProScan |
Zdobnov and Apweiler. Bioinformatics
17, 847-848. (2001) |
SW-based InterPro motif search |
iPSORT |
Bannai et al. Bioinformatics 18,
298-305. (2002) |
Predicts the subcellular location of proteins |
TMHMM |
A. Krogh et al. J. Mol. Biol. 305,
567-580. (2001) |
Prediction of transmembrane
helices in proteins |
COILS |
A. Lupas et al. Science 252, 1162-1164.
(1991) |
Prediction of coiled-coil
conformation from protein sequences |
SignalP |
H. Nielsen el al. Proc Int Conf Intell
Syst Mol Biol 6, 122-130. (1998) |
Prediction
of the presence and location of signal peptide cleavage sites in amino acid
sequences |
Gene
structure; Open Reading Frame |
DECODER
(in house) |
Fukunishi and Hayashizaki,
Physiological genomics 5, 81-87. (2001) |
extracts open reading frames
from sequences and corrects frame-shifts |
rsCDS
(in house) |
M. Furuno et al. in preration |
CDS prediction completely based
on homology search of protein sequences |
ProCrest
(in house) |
J. Adachi et al. in preparation |
CDS prediction based on coding
potential in DNA sequences |
NCBI CDS
Predictor (in house) |
L. Wagner, (unpublished) |
CDS prediction based on both
homology proteins and coding potential |
Sequence
assembly, clustering, Gene Index building |
Phred |
Ewing and Green. Genome Res. 8,
186-194. (1998) |
reads DNA sequencer tracedata,
calls bases, and assigns quality values to the bases |
Phrap |
|
assembles shotgun DNA
sequencedata to a contig sequence |
Consed |
D. Gordon et al. Genome Res. 8,
195-202. (1998) |
edits sequence assembliescreated
by Phrap for reassembling of the same data set |
CAP3 |
X. Huang et al. Genome Res. 9, 868-877.
(1999) |
assembles sequences using base
quality values in computation of overlaps between reads; construction of
multiple sequence alignments of reads, and generation of consensus sequences;
integrated in the TIGR Gene Index assembly pipline |
Megablast |
|
nucleotide sequence alignment
search program, used for clustering in the TIGR Gene Index assembly |
TGI assemby pipeline |
J. Quackenbush et al. Nucleic Acids
Res. 29, 159-164. (2001) |
TIGR Gene Index assembly pipline |
Mapping and genomic alignments |
TGI mapping pipeline |
|
genomic alignment and groupingof
tentative transcript sequences |
blEST |
L. Florea et al. Genome Res. 8,
967-974. (1998) |
cDNA-genome alignment program
integrated in TIGR Gene Index genomic mapping pipeline |
SIM4 |
L. Florea et al. Genome Res. 8,
967-974. (1998) |
aligns a cDNA sequence to a
genomic sequence under the assumption that the differences between the two
sequences are limited to introns in the genomic sequence and sequencing
errors in either of the sequences |
Gene
Ontology Browser |
|
GO around |
J. Tanoue et al. Bioinformatics
(in press) |
Gene ontology viewer |
|
|
|
Database |
Reference |
Description |
Nucleotide
sequence |
DDBJ |
Tateno et al. Nuecleic Acids Res. 30,
27-30. (2002) |
all known nucleotide and protein
sequences |
EMBL |
Stoesser et al . Nucleic Acids Res. 30,
21-26. (2002) |
all known nucleotide and protein
sequences |
GenBank |
Benson et al. Nucleic Acids Res. 30,
17-20. (2002) |
all known nucleotide and protein
sequences |
Mouse Genome Informatics (MGI) - Mouse Genome Database
(MGD) |
Blake et al. Nucleic Acids Res. 30,
113-115. (2002) |
model organsim database for the
laboratory mouse; gene, sequence, nomenclature, GO information among others |
RefSeq/LocusLink |
Pruitt et al. Nucleic Acids Res. 29,
137-140. (2001) |
non-redundant collection of
genes and reference reference sequence standards |
dbEST(mouse division) |
|
mouse EST sequences |
UniGene |
Wheeler et al. Nucleic Acids Res.
30:13-16, 2002 |
clusters of ESTs and full-length
mRNA sequences; each cluster; represent a unique known or putative gene |
TIGR Gene Indices |
J. Quackenbush et al. Nucleic
Acids Res. 29, 159-164. (2001) |
TIGR and GenBank EST sequences
assembled to tentative consensus sequences |
nt(NCBI) |
Wheeler et al. Nucleic Acids Res. 30,
13-16. (2002) |
all
GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2
HTGS sequences). No longer "non-redundant". |
Alternative
splicing dB |
Zavolan et al. manusript in
preparation |
Database of alternatively
spliced mouse transcripts |
Mapping |
MGSC v3 |
Mouse Genome Sequencing
Consortium. Nature. (this issue) (2002) |
mouse genome sequence assembly |
Human "Golden Path" |
International Human Genome Sequencing
Consortium, Nature 409, 860-921. (2001) |
human genome sequence assembly |
Ensembl |
Hubbard et al. Nucleic Acids Res. 30,
38-41. (2002) |
genome dataset containing
confirmed and predicted genes, exons, transcripts, and contigs |
Riken-GenoMapper M. musculus cDNA mapping |
H. Kiyosawa et al. in
preparation |
RIKEN clones mapped to mouse
genome incl. information disease, public mouse genes, markers and ESTs |
Riken-GenoMapper H. sapiens cDNA mapping |
H. Kiyosawa et al. in
preparation |
RIKEN clones mapped to human
genome incl. information disease, public mouse genes, markers and ESTs |
Radiation Hybrid Map |
I. Yamanaka et al. J. Struct.
Func. Genomics 2, 23-28. (2002) |
RIKEN clones mapped to mouse
chromosomes based on sequence homology to ESTs of Whitehead mouse T31
radiation hybrid map |
Protein
sequence |
nr(NCBI) |
Wheeler et al. Nucleic Acids Res. 30,
13-16. (2002) |
non-redundant
GenBank CDS translations+PDB+SwissProt+PIR+PRF |
SPTR (SwissProt + TrEMBL non-redundant protein set) |
Bairoch et al. Nucleic Acids
Res. 28, 45-48. (2000) |
annotated protein databasewith
minimum redunandancy, annotation incl. GO terms and functional sites |
PIR NREF |
Wu et al. Nucleic Acids Res. 30, 35-37.
(2002) |
non-redundant reference protein database that includes all
sequences from PIR-PSD,
Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB |
Domains,
motifs and superfamilies |
SCOP |
Lo Conte et al. Nucleic Acids Res. 30,
264-267. (2002) |
structural classification of
proteins |
SUPERFAMILY |
Gough et al., Nucleic Acids Res. 30,
268-272. (2002) |
|
Pfam |
Bateman et al. Nucleic Acids Res. 30,
276-280. (2002) |
semi-automatic protein
familydatabase containing multiple protein alignments and profile-HMMs of
thesefamilies |
MDS |
Kawaji et al. Genome Res. 12, 367-378.
(2002) |
novel motifs extracted from SPTR
and FANTOM DB |
InterPro |
Apweiler et al. Nucleic Acids Res. 29,
37-40. (2001) |
integrated view of otherdomain
and functional site databases (PROSITE, PRINTS, ProDom and Pfam) |
UTRsite and UTRdb |
Pesole et al. Nucleic Acids Res. 30,
335-340. (2002) |
UTRsite: nucleotide sequence
patterns of UTRs where a functional role has been shown epxerimentally; UTRdB
a non-redundant 3' and 5'UTRsequences of eukaryotic mRNAs enriched with
annotations abouts functional elements and repeats |
Pathway |
KEGG |
Kanehisa et al. Nucleic Acids Res. 30,
42-46. (2002) |
metabolic and regulatory pathway
maps |
Disease |
OMIM |
Wheeler et al. Nucleic Acids Res. 30,
13-16. (2002) |
catalog of human genes and
genetic disorders |
Literature |
PubMed |
|
abstracts and
bibliographicinformation of journal articles and books |
Gene
Onotology |
GO database |
Ashburner et al. Nat Genet. 25, 25-29.
(2000) |
gene ontology terms |
SNP |
dbSNP |
Wheeler et al. Nucleic Acids Res. 30,
13-16. (2002) |
single nucletoide polymorphism |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|