Supplementary Table 5

A. Motifs identified by maximum density subgraph analysis
 
SeqID or
accession
or name
Position RIKEN Defintion/SPTR protein name Motif
accession
Result of HMMER searches
7531
8146
Q9UFF7
Q9UH43
Q9V7X1
Q9VU34
414-453
372-411
552-591
206-245
847-886
827-866
unclassifiable
DNA segment, EST 573322 (MGI:1349561)
Hypothetical 71.5 kDa protein (fragment)(Homo sapiens)
DJ1014D13.3 (novel protein) (fragment) (Homo sapiens)
CG15609 protein (Drosophila melanogaster)
CG11259 protein (Drosophila melanogaster)
MD00113 EWFKLVLEKN KLMRYESELL IMAQELELED HQSRLEQKLR
DWFKLIHEKH LLVRRESELI YVFKQQNLEQ RQADVEFELR
QLLQLVDKKN SLVAEEAELM ITVQELNLEE KQWQLDQELR
DWFKLIHEKH LLVRRESELI YVFKQQNLEQ RQADVEYELR
QWFTLVNKKN ALLRRQMQLN ILEQEKDLER KYTMLNQELR
QLFELVNEKN ELFRRQAELM YLRRQHRLEQ EQADIEHEIR
8472
9875
12294
12790
14033
15271
22108
2-28
366-391 
2-27
2-27
2-27
2-27
2-25
unclassifiable transcript
Homeobox domain containing protein (InterPro IPR001356)
unclassifiable
dihydrolipoamide dehydrogenase (MGI:107450)
unclassifiable transcript
unclassifiable transcript
unclassifiable
MD00121 WFLGFELRTF RRTVGVLTRR AIYLSSP
WLLGFELLTF GRAVGCSYPL SH-LTSP
WLLGFELLTF GRAVGCSYPL SH-LTTP
WLLGFELWTF GRAVGCSYPL SH-LTSP
WLLGFELRTF RRAVRCSYPL SH-LTSP
WLLGFELRTF GRAVGCSYLL SH-LTSP
WLLGFELRTF RRAVGALNHR AF---SP
2213
7180
8388
8484
14092
19181
19684
147-166
3-22
117-136
16-35
3-22
17-36
52-71
related to COSMID C49H3 (SPTR Q18720)*
unclassifiable transcript
hypothetical protein
receptor-like tyrosine kinase (MGI:101766)
unclassifiable transcript
unclassifiable transcript
unclassifiable transcript
MD00129 FIPALKRWRQ VDLCEFKASL
LIPALNRQRQ ADLCKFEDSQ
LVPAPRRQRW AYLCEFKASL
LIPALGRQRQ AVLCEFQTIL
LVPVLGRHRQ ADPCEFKASL
LIPALRRQRQ VDLCEFLANL
LIPALRRQGQ AGLCEMKASL
6649
14274
17278
23112
174-203
29-58
19-48
28-57
mutS homolog 3 (E. coli) (MGI:109519)
Protamine P1 containing protein (Pfam PF00260)
Protamine P1 containing protein (Pfam PF00260)
hypothetical protein
MD00137 HAFSRPARKN AANRNLLWQK LYCLHLQEPE
YTRSRPARKN TANQNLLRQS FIAYIFRSKK
HAFSRPARKN ATNRNLLRQK LYCLHLQEPE
HTFTRPARKD ATNRNLLRQK LYCLHLQEPE
17174
17937
21368
Q9WU72
7-35
46-74
32-56
137-163
unclassifiable
unclassifiable transcript
unclassifiable transcript
B-cell activating factor (Mus musculus)
MD00139 DVELSAPSPE PCLPACHHVS RHDENGLNL
EVVLSATSPA PCLPACHHAS LPGNNGLNL
DVELSAP-PT PCLPGC---Y LVDGNGLNL
DVDLSAP-PA PCLPGCRH-S QHDDNGMNL
7516
15434
18937
m_oatp1
m_oatp2-1
m_oatp2-c
m_LST-1
r_oatp1
r_oatp2
r_oatp3
r_oatp5
r_OAT-K1
r_OAT-K2
r_LST-1
r_PGT
h_OATP
h_OATP8
h_OATP-B
h_OATP-E
h_OATP-F
h_LST-1
h_PGT
491-527
485-521
268-304
450-486
450-486
490-526
384-422
450-486
449-485
450-486
450-486
450-486
278-314
432-470
455-494
450-486
470-506
500-541
515-553
487-523
470-506
455-494 
hypothetical protein
hypothetical protein
homolog to putative organic anion transporter 
organic anion transporting polypeptide 1
liver organic anion transporting polypeptide 2
cochlea organic anion transporter 2
liver specific transporter-1
organic anion transporting polypeptide 1
sodium-independent organic anion transporter 2
sodium-independent organic anion transporter 3
organic anion transporting polypeptide 5
sodium-independent organic anion transporter K1
sodium-independent organic anion transporter K2
liver-specific organic anion transporter
prostaglandin transporter (PGT)
sodium-independent organic anion transporter
organic anion transporter 8 (OATP8)
organic anion transporter OATP-B
organic anion transporter OATP-E
organic anion transporting polypeptide 14
liver-specific organic anion transporter
prostaglandin transporter (PGT)
MD00148 YNPVCG-RDE TQYFSPCFAG CKATKKLRKE K----TYYNC SC
YTSICG-RDE KEYFSPCFAG CKATKVSQTE K----TYYNC SC
YSSVCG-RDE TEYFSPCFAG CLASKHLDYE K----TFYNC SC
WDPVCG-DNG LAYMSACLAG CE-KSVGTGT N---MVFQNC SC
WDPVCG-DNG LSYMSACLAG CE-KSVGTGT N---MVFQNC SC
WEPMCG-DNG ITYVSACLAG CQ-SSSRSGK N---IIFSNC TC
WEPVCG-ENG VTYISPCLAG CK-SFRGDKK LM-NIEFYDC SC
WDPVCG-DNG VAYMSACLAG CK-KFVGTGT N---MVFQDC SC
WDPVCG-DNG LAYMSACLAG CE-KSVGTGT N---MVFQNC SC
WDPVCG-DNG LAYMSACLAG CK-KSVGTGT N---MVFQNC SC
WDPVCG-DNG LAYITPCLAG CE-KSVGSGI N---MVLQDC SC
WDPVCG-DNG LAYMSACLAG CE-KSVGTGT N---MVFHNC SC
WDPVCG-DNG LAYMSACLAG CE-KSVGTGT N---MVFHNC SC
WEPICG-ENG VTYISPCLAG CK-SFRGDKK PN-NTEFYDC SC
FHPVCG-DNG VEYVSPCHAG CSSTNTSSEA SK-EPIYLNC SC
WDPVCG-NNG LSYLSACLAG CE-TSIGTGI N---MVFQNC SC
WEPVCG-NNG ITYLSPCLAG CK-SSSGIKK H---TVFYNC SC
FNPVCDPSTR VEYITPCHAG CSSWVVQDAL DNSQVFYTNC SC
YSPVCG-SDG LMYFSLCHAG CP-AATETNV DG-QKVYRDC SC
WEPMCG-ENG ITYVSACLAG CQ-TSNRSGK N---IIFYNC TC
WEPVCG-NNG ITYISPCLAG CK-SSSGNKK P---IVFYNC SC
FHPVCG-DNG IEYLSPCHAG CSNINMSSAT SK-QLIYLNC SC
MD00113 is a leucine-zipper like motif. MD00148 represents a motif among the OATP family sequences. No SPTR hits were found for MD00121, MD00129 and MD00137. Accessions for the known OATP sequences are given after each OATP gene name (h:human, m:mouse, r:rat, m_oatp1: Genbank AB031813; m_oatp2-1: AB031814; m_oatp2-c: AY007379; m_LST-1: AB037202; r_oatp1: L19031; r_oatp2: U88036; r_oatp3: AF041105; r_oatp5: AF053317; r_OAT-K1: D79981; r_OAT-K2: AB012662; r_LST-1: r_PGT: SwissProt Q00910; h_OATP: U21943; h_OATP8: AJ251506; h_OATP-B: AB020687; h_OATP-E: AB031051; h_OATP-F: AF260704; h_LST-1: AF060500; h_PGT: U70867). SPTR entry Q18720 was available at the time of sequence analysis. In the meantime the entry has been deleted from SwissProt TrEMBL.

B. UTR Functional Elements
 
Functional Element
Hypothetical protein
Unclassifiable
Transcript
Unclassifiable
Determined CDS 1
Total
5'UTR or 3'UTR Located Elements
Iron responsive element
5
4
3
18
30
5'UTR Located Elements
Ribosomal mRNA TOP element
0
0
0
5
5
3'UTR Located Elements
Histone 3'UTR stem-loop structure
0
0
2
8
10
Seleno cysteine insertion sequence
4
16
5
34
59
Erythroid 15-lipoxygenase differentiation control element
1
6
6
13
26
AUUUA-containing class II AU rich element
1
8
7
5
21
GLUT1 RNA stability determinant
2
2
1
2
7
Vimentin 3'UTR element
0
0
0
1
1
1RIKEN clones defined by significant similarity to known nucleotide sequences, protein sequences or protein domains.

Number of annotated clones for each UTRsite pattern, classified on the basis of FANTOM's definitions: hypothetical protein, unclassifiable transcript, unclassifiable and all other categories (reported as "determined CDS"). Matching clones have been manually checked in order to compare the location of the pattern in the sequence to the location of the predicted CDS. The UTRsite functional pattern has been annotated only in clones where it was present in the correct position and fulfilling pattern specific features as described in the UTRsite database.