Supplementary Table 7

A. Databases that were used for the functional annotation

Database or repository	Description
Nucleotide sequence
Mouse Gene Index (in-house)	redundant phase I clone sequencing data
nr-nt (in-house)	non-redundant database built from Genbank, EMBL, DDBJ, and their cumulative daily-updated nucleotide sequences
tigr-mgi	Nucleotide sequences from TIGR Mouse Gene Index
MGI	integrated view of gene characterization, nomenclature, genetic markers, mapping, gene homologies, expression, phenotype and other biological data
est_mouse	mouse EST sequences
UniGene	clusters of ESTs and full-length mRNA sequences; each cluster; represent a unique known or putative human gene
TIGR Gene Indices	human and non-human TIGR and GenBank EST sequences assembled to tentative consensus sequences
UTRdB	a non-redundant 3' and 5'UTR sequences of eukaryotic mRNAs enriched with annotations abouts functional elements and repeats
Mapping
Whitehead Mouse RH dB	T31 RH hybrid data of 20 mouse chromosomes
Jackson Laboratory T31 Mouse RH dB	T31 RH data of 20 mouse chromosomes from various sources incl. WICGR mouse RH dB, The UK Mouse Genome Centre, Genoscope - CNS mapped together into a single comprehensive map
Refseq	reference sequence standards for chromosomes, mRNAs, and proteins for the functional annotation of genome data
Ensembl	human genome dataset containing confirmed and predicted genes, exons, transcripts, and contigs
Protein sequence
NCBI-nr	non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PDB
SwissProt	annotated protein database with minimum redunandancy, annotation incl. GO terms and functional sites>
TrEMBL	translations of all CDS present in the EMBL, which are not yet integrated into SWISS-PROT
TIGR's nr-aa	non-redundant amino acid sequence database prepared at TIGR using data from EGAD, SwissProt, PDB and GenPept
Gene cluster
HomoloGene	curated orthologs of mouse, rat, and human and zebrafish, mouse human, calculated orthologs for sequence comparisons between all UniGene clusters for each pair of organism
Pfam	semi-automatic protein family database containing multiple protein alignments and profile-HMMs of these families
TIGRFAM	a curated protein family database containing multiple protein alignments and profile HMMs of these families
InterPro	integrated view of other domain and functional site databases (PROSITE, PRINTS, ProDom and Pfam)
UTRsite	nucleotide sequence patterns of UTRs where a functional role has been shown epxerimentally
Pathway
KEGG	metabolic and regulatory pathway maps
Disease
LocusLink	annotated sequence and descriptive information about genetic loci
Refseq	reference sequence standards for chromosomes, mRNAs, and proteins for the functional annotation of genome data
OMIM	catalog of human genes and genetic disorders
Literature
PubMed	abstracts and bibliographic information of journal articles and books
Gene Onotology
swp2go	gene ontology index for mapping of SwissProt keywords to GO terms
egad2go	gene ontology index for mapping of EGAD cellular roles to GO terms

B. Software that was used during full-length sequencing and the functional annotation

Software name	Description
Functional Annotation
FANTOM+	web-based system for human curation of sequences
Database searching
NCBI-BLAST	Basic Local Alignment Search Tool that includes s a set of similarity search programs(BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX)
RepeatMasker	screens DNA sequences against a library of repetitive elements, as well as for low complexity regions; it returns a masked query sequence ready for database searches
FASTA	The package that compares a sequence to another sequence or to a sequence database using the FASTA algorithm. Especially, FASTY program was frequently used in the FANTOM meeting. (FASTY is a program that compares a DNA sequence to a protein sequence database using the FASTA algorithm; it translates the DNA sequence in three forward (or reverse) frames and allows frameshifts)
FLAST (in house)	DDS based program that compares a query sequence pairwise with a cDNA sequence database
Wise2	Wise2 is a package for comparing DNA and protein sequences. In the meeting, estwise in the Wise2 package was frequently used because it can compare a protein sequence against an EST/cDNA sequence with the option of using a protein profile HMM
HMMER	profile hidden Markov models for biological sequence analysis; searches a sequence database with a profile HMM or builds a hidden Markov model from an sequence alignment
Patsearch	finds functional elements in nucleotide and protein sequences and assesses their statistical significance
Gene structure; Open Reading Frame
GenScan	determines the most likely gene structure (exon/intron) under a probabilistic model of the gene structural and compositional properties of the genomic DNA for a given organism
ORF Finder	finds all open reading frames of a selected minimum size in a sequence
DECODER (in house)	extracts open reading frames from sequences and corrects frame-shifts
Multiple sequence alignment
CLUSTALW	progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Cluster analysis
Maximum density subgraph (in house)	generates a linkage graph whose veritices are sequences and edges are pairwise similarities; it then finds subgraphs whose vertices are connected with a fraction'p' of the other vertices until all sequences are covered and the maximum density (sum of similarities/no of nodes) is found
Assemble
Phred	reads DNA sequencer trace data, calls bases, and assigns quality values to the bases
Phrap	assembles shotgun DNA sequence data to a contig sequence
Consed	edits sequence assemblies created by Phrap for reassembling of the same data set
CAP3	assembles sequences using base quality values in computation of overlaps between reads; construction of multiple sequence alignments of reads, and generation of consensus sequences
Others
bioSCOUT	commercial software package for enhanced sequence analysis
experimental programs	extraction and assignment of GO terms

Supplementary Table 7

A. Databases that were used for the functional annotation

Nucleotide sequence

Mapping

Protein sequence

Gene cluster

Pathway

Disease

Literature

Gene Onotology

B. Software that was used during full-length sequencing and the functional annotation

Functional Annotation

Database searching

Gene structure; Open Reading Frame

Multiple sequence alignment

Cluster analysis

Assemble

Others