Database or repository |
Description |
Nucleotide sequence
|
Mouse Gene Index (in-house) |
redundant phase I clone
sequencing data |
nr-nt (in-house) |
non-redundant database built
from Genbank, EMBL, DDBJ, and their cumulative daily-updated nucleotide
sequences |
tigr-mgi |
Nucleotide sequences from
TIGR Mouse Gene Index |
MGI |
integrated view of gene
characterization, nomenclature, genetic markers, mapping, gene homologies,
expression, phenotype and other biological data |
est_mouse |
mouse EST sequences |
UniGene |
clusters of ESTs and full-length
mRNA sequences; each cluster; represent a unique known or putative human
gene |
TIGR
Gene Indices |
human and non-human TIGR
and GenBank EST sequences assembled to tentative consensus sequences |
UTRdB |
a non-redundant 3' and 5'UTR
sequences of eukaryotic mRNAs enriched with annotations abouts functional
elements and repeats |
Mapping
|
Whitehead
Mouse RH dB |
T31 RH hybrid data of 20
mouse chromosomes |
Jackson
Laboratory T31 Mouse RH dB |
T31 RH data of 20 mouse
chromosomes from various sources incl. WICGR mouse RH dB, The UK Mouse
Genome Centre, Genoscope - CNS mapped together into a single comprehensive
map |
Refseq |
reference sequence standards
for chromosomes, mRNAs, and proteins for the functional annotation of genome
data |
Ensembl |
human genome dataset containing
confirmed and predicted genes, exons, transcripts, and contigs |
Protein sequence
|
NCBI-nr |
non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PDB |
SwissProt |
annotated protein database
with minimum redunandancy, annotation incl. GO terms and functional sites> |
TrEMBL |
translations of all CDS
present in the EMBL, which are not yet integrated into SWISS-PROT |
TIGR's nr-aa |
non-redundant amino acid
sequence database prepared at TIGR using data from EGAD, SwissProt, PDB
and GenPept |
Gene cluster
|
HomoloGene |
curated orthologs of mouse,
rat, and human and zebrafish, mouse human, calculated orthologs for sequence
comparisons between all UniGene clusters for each pair of organism |
Pfam |
semi-automatic protein family
database containing multiple protein alignments and profile-HMMs of these
families |
TIGRFAM |
a curated protein family
database containing multiple protein alignments and profile HMMs of these
families |
InterPro |
integrated view of other
domain and functional site databases (PROSITE, PRINTS, ProDom and Pfam) |
UTRsite |
nucleotide sequence patterns
of UTRs where a functional role has been shown epxerimentally |
Pathway
|
KEGG |
metabolic and regulatory
pathway maps |
Disease
|
LocusLink |
annotated sequence and descriptive
information about genetic loci |
Refseq |
reference sequence standards
for chromosomes, mRNAs, and proteins for the functional annotation of genome
data |
OMIM |
catalog of human genes and
genetic disorders |
Literature
|
PubMed |
abstracts and bibliographic
information of journal articles and books |
Gene Onotology
|
swp2go |
gene ontology index for
mapping of SwissProt keywords to GO terms |
egad2go |
gene ontology index for
mapping of EGAD cellular roles to GO terms |
Software name |
Description |
Functional Annotation
|
FANTOM+ |
web-based system for human
curation of sequences |
Database searching
|
NCBI-BLAST |
Basic Local Alignment Search
Tool that includes s a set of similarity search programs(BLASTN, BLASTP,
BLASTX, TBLASTN, TBLASTX) |
RepeatMasker |
screens DNA sequences against
a library of repetitive elements, as well as for low complexity regions;
it returns a masked query sequence ready for database searches |
FASTA |
The package that compares
a sequence to another sequence or to a sequence database using the FASTA
algorithm. Especially, FASTY program was frequently used in the FANTOM
meeting. (FASTY is a program that compares a DNA sequence to a protein
sequence database using the FASTA algorithm; it translates the DNA sequence
in three forward (or reverse) frames and allows frameshifts) |
FLAST (in house) |
DDS
based program that compares a query sequence pairwise with a cDNA sequence
database |
Wise2 |
Wise2 is a package for comparing
DNA and protein sequences. In the meeting, estwise in the Wise2 package
was frequently used because it can compare a protein sequence against an
EST/cDNA sequence with the option of using a protein profile HMM |
HMMER |
profile hidden Markov models
for biological sequence analysis; searches a sequence database with a profile
HMM or builds a hidden Markov model from an sequence alignment |
Patsearch |
finds functional elements
in nucleotide and protein sequences and assesses their statistical significance |
Gene structure; Open Reading
Frame
|
GenScan |
determines the most likely
gene structure (exon/intron) under a probabilistic model of the gene structural
and compositional properties of the genomic DNA for a given organism |
ORF
Finder |
finds all open reading frames
of a selected minimum size in a sequence |
DECODER (in house) |
extracts open reading frames
from sequences and corrects frame-shifts |
Multiple sequence alignment
|
CLUSTALW |
progressive multiple sequence
alignment through sequence weighting, position-specific gap penalties and
weight matrix choice |
Cluster analysis
|
Maximum density subgraph
(in house) |
generates a linkage graph
whose veritices are sequences and edges are pairwise similarities; it then
finds subgraphs whose vertices are connected with a fraction'p' of
the other vertices until all sequences are covered and the maximum density
(sum of similarities/no of nodes) is found |
Assemble
|
Phred |
reads DNA sequencer trace
data, calls bases, and assigns quality values to the bases |
Phrap |
assembles shotgun DNA sequence
data to a contig sequence |
Consed |
edits sequence assemblies
created by Phrap for reassembling of the same data set |
CAP3 |
assembles sequences using
base quality values in computation of overlaps between reads; construction
of multiple sequence alignments of reads, and generation of consensus sequences |
Others
|
bioSCOUT |
commercial software package
for enhanced sequence analysis |
experimental programs |
extraction and assignment
of GO terms |