Materials and Method

Materials and Method

Phase I

All cDNAs were cloned from C57BL/6J mouse mRNA. Around 160 different tissues at different stages were used whose list was reported elsewhere¹ (P. Carninci, et al. submitted). The CTAB method was used for RNA extraction. Our original system to construct full-length cDNA libraries consists of several technologies, such as Cap trapper method^2,3, the thermoactivation method of reverse transcriptase by trehalose⁴, normalization and subtraction method⁵, oligolinker method (SSLMP, Biotechniques) and usage of a new vector
(manuscript in preparation). Full-length cDNA clones derived from various tissues were picked up and arrayed by Q-bot(GENETIX LIMITED) onto 384 (16x24) format plates which were incubated at 30^oC.

The strategy based on the end sequencing to make non-redundant set of full-length clones was described elsewhere (H. Konno et al., Genome Research, in press and P. Carninci., et al. submitted). The clones were clustered based on the 3'-end sequences.

Phase II

Rearray of clones

A single clone derived from the library of the best quality was selected as the representative from each cluster. The representative clones were picked up by Q-bot (GENETIX LIMITED) and arrayed onto 384 well-format plates. To make rearray plates, the E. coli was cultivated at 30^oC for 18-24 hrs with 50 microL of LB medium (100 microG/ml of ampicillin / 50 microG/ml of kanamicin or 100 microG/ml of ampicillin / 25 microG/ml of streptomicin for PS/DH10B or ZAP/SOLR as host/vector system, respectively).

Plasmid extraction and InsSizing

Each clone was cultivated in 1.3 ml of HT medium with 100 microG/ml of ampicillin at 37^oC for 21 hrs and the plasmid DNA was purified using QIAprep 96 Turbo (QIAGEN). To check the size of cDNA, 1/30 of plasmid DNA was digested by PvuII and subjected to 1% of agarose electrophoresis.

Sequencing

Three types of sequencers were used for the full-sequencing analysis. Depending on the insert size, the cDNAs were classified into two categories (cDNAs shorther than 2.5 kb and longer than 2.5kb) The short clones of the first category were sequenced from both ends using Licor DNA4200 (long read sequencer) with Thermosequenase Primer Cycle Sequencing Kit (Amersham Pharmacia Biotech). To achieve forward and reverse sequencing for the end sequencing we used the primer sets CACGACGTTGTAAAACGAC/GGATAACAATTTCACACAGG for ZAP, and CACGACGTTGTAAAACGAC/GGATAACAATTTCACACAGG for PS, respectively. The remaining gaps were filled up by the primer walking procedure, using ABI Prism377 and/or ABI Prism3700 (Applied Biosystems Inc.) with BigDye terminator kit and Cycle Sequencing FS ready Reaction Kit (Applied Biosystems Inc.). The long clones of the second category were sequenced based on the shotgun sequencing strategy by Shimadzu RISA 384 with DYEnamic ET terminator cycle sequencing kit (Amersham Pharmacia Biotech). In order to make a shot gun library, 48 PCR-amplified DNA fragments from 48 independent representative clones, whose identification were confirmed by end sequencing, were pooled and concatenated, followed by a shearing step using the Double Stroke Shearing Device (Fiore Inc.) as described elsewhere (M. Yoshino et al., in preparation). The ends of the DNA fragments were truncated to make blunt ends by T4 DNA polymerase. These DNA fragments were cloned into pUC18 and transformed into DH10B. Shotgun sequence was achieved with 12-15 redundancy. The remaining gaps were filled up by primer-walking procedure as mentioned above.

Assembling and gap-closing

All electropherograms from four sequencers, RISA, Licor DNA4200, ABI Prism377 and ABI Prism3700 were base-called by Phred^6,7. The electropherogram of the Licor DNA4200 was modified for Phredby BaseImagIR version 3.1. The electropherogram of the RISA was also adapted to Phred(J. Adachi et al., in preparation).

Editing sequence data consisted of three steps. The first step comprises a computer-assisted system that assembles the raw sequence data. The second step includes gap-closing using public EST database, such as Genbank mouse EST database. Even after the second step, gaps still remain. In this case we proceeded to the final step of primer walking and resequencing.

In the first step, the Phrapassembler and Consed⁸ were used for assembling and editing the sequence including primer design, respectively. Prior to the assembling procedure, the data were treated with the cross-match program of the Phrappackage to mask vector sequences. Phrap requires a base quality-indicator, the Phred score, for the assembling step in each base of each cDNA. Since the Phredscore is not given in public EST databases we assigned a putative Phredscore to each base of each EST of a public database. Usually,
sequence qualities at the regions close to the primer and to the end are poor, compared with the middle region. We assumed Phred score 20 for the middle region which covers 3/5 of entire sequence and score 0 at the first and last base. In both regions covering 1/5 of entire sequence starting from the first and the last nucleotide, we assigned the Phred scores 0 to 20 to each nucleotide in proportion to the distance of the respective end. Since EST sequences of public databases contain also the ambiguous base 'N' we gave the putative score 5 to eleven nucleotides from the -5 until the +5 position counted from the ambiguous base 'N'. As a result, gaps in 4,622 (22%) out of 21,076 sequences were closed by using public EST data.

The primer walking procedure was employed to close the remaining gaps. The primer sequences were designed by the computer software Consed.The additionally produced sequence data by primer walking were connected to the original gap-containing sequence data until the gaps were closed. ABI Prism377 and/or ABI Prism3700 sequencers were used for the primer walking strategy.

Identification check

After the entire sequence of each clone has been determined, we checked the identification between the clone and the sequence. Due to mistakes the 3'- and 5'- sequences produced by Phase II are sometimes different from original Phase I 3'- and 5'- sequences.Some of the errors for example, the reversed placement of a 384 well-format plate, could be clearly corrected by checking the whole procedure of Phase II. The Phase II sequences whose corresponding clones could not be identified were eliminated from the subsequent analyses.

References

1. Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol 303, 19-44 (1999).

2. Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327-36 (1996).

3. Carninci, P. et al. High efficiency selection of full-length cDNA by improved biotinylated cap trapper. DNA Res 4, 61-6 (1997).

4. Carninci, P. et al. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis
of full length cDNA. Proc Natl Acad Sci U S A95, 520-4 (1998).

5. Carninci, P. et al. Normalization and Subtraction of Cap-Trapper-Selected cDNAs to Prepare Full-Length cDNA Libraries for Rapid
Discovery of New Genes. Genome Res 10, 1617-1630 (2000).

6. Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment.
Genome Res 8, 175-85 (1998).

7. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186-94 (1998).

8. Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res 8, 195-202 (1998).