Transcription

1 J Mol Evol (2002) 54: DOI: /s Springer-Verlag New York Inc The Cephalopod Loligo bleekeri Mitochondrial Genome: Multiplied Noncoding Regions and Transposition of trna Genes Kozo Tomita, 1,* Shin-ichi Yokobori, 2 Tairo Oshima, 2 Takuya Ueda, 3 Kimitsuna Watanabe 1,3 1 Department of Chemistry and Biotechnology, Graduate School of Engineering, University of Tokyo, Hongo, Bunkyo-ku, Tokyo , Japan 2 Department of Molecular Biology, School of Life Science, Tokyo University of Pharmacy and Life Science, , Horinouchi, Hachioji, Tokyo , Japan 3 Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Hongo, Bunkyo-ku, Tokyo , Japan Received: 9 May 2001 / Accepted: 3 October 2001 Abstract. We previously reported the sequence of a 9260-bp fragment of mitochondrial (mt) DNA of the cephalopod Loligo bleekeri [J. Sasuga et al. (1999) J. Mol. Evol. 48: ]. To clarify further the characteristics of Loligo mtdna, we have sequenced an 8148-bp fragment to reveal the complete mt genome sequence. Loligo mtdna is 17,211 bp long and possesses a standard set of metazoan mt genes. Its gene arrangement is not identical to any other metazoan mt gene arrangement reported so far. Three of the 19 noncoding regions longer than 10 bp are 515, 507, and 509 bp long, and their sequences are nearly identical, suggesting that multiplication of these noncoding regions occurred in an ancestral Loligo mt genome. Comparison of the gene arrangements of Loligo, Katharina tunicata, and Littorina saxatilis mt genomes revealed that 17 trna genes of the Loligo mt genome are adjacent to noncoding regions. A majority (15 trna genes) of their counterparts is found in two trna gene clusters of the Katharina mt genome. Therefore, the Loligo mt genome (17 trna genes) may have spread over the genome, and this may have been coupled with the multiplication of the noncoding regions. Maximum likelihood analysis of mt protein genes supports the clade Mollusca + Annelida + Brachiopoda but *Present address: Department of Biochemistry, University of Washington, Seattle, WA , USA Correspondence to: Shin-ichi Yokobori; ls.toyaku.ac.jp fails to infer the relationships among Katharina, Loligo, and three gastropod species. Key words: Cephalopod Loligo bleekeri Mitochondrial DNA Molluscan phylogeny Gene rearrangement Noncoding region trna Introduction Typical metazoan mitochondrial (mt) genomes are circular, are kb in size, and encode 13 protein, 2 rrna [small and large subunit rrnas (rrns and rrnl)], and 22 trna (trna, trnc, etc.) genes but no introns (Wolstenholme 1992; Boore 1999). The 13 polypeptides are involved in ATP synthesis coupled with electron transfer during O 2 consumption [ATP synthetase subunits 6 and 8 (atp6 and atp8), cytochrome oxidase subunits I III (cox1 cox3), apocytochrome b (cob), and NADH dehydrogenase subunits 1 6 and 4L (nad1-6 and nad4l)]. Although the mt gene order is well conserved in several phyla (e.g., Arthropoda and Vertebrata), large variations in mt genome structure have been found within and between several specific groups of Mollusca. Complete mtdna sequences have been reported for various mollusks: a polyplacophoran [Katharina tunicata (Boore and Brown 1994)] and three gastropods, Cepaea nemoralis [Pulmonata (Terrett et al. 1996; Yamazaki et al. 1997)], Albinaria coerulea [Pulmonata

2 487 Fig. 1. Amplification strategy of the Loligo mtdna segment sequenced in this study (upper panel) and complete gene organization of the Loligo mtdna (lower panel). The relative position of each PCR fragment is shown by a horizontal bar. Sequences of the PCR primers are listed in the text. The circular genome is shown in a linear form. Genes encoded on the opposite strand are in gray boxes. trna genes are shown using one-letter abbreviations above the panel, or below the panel in the case of those encoded on the opposite strand. trnl(uaa), trnl(uag), trns(uga), and trns(gcu) are designated L 1,L 2,S 1, and S 2, respectively. The noncoding regions (>10 bp) are designated NC1, etc., and shown in black boxes.

3 488 (Hatzoglou et al. 1995)], and Pupa strigosa [Opisthobranchia (Kurabayashi and Ueshima 2000)]. In addition, the complete mtdna gene arrangements of the bivalve Mytilus edulis (Hoffmann et al. 1992; see also Beagley et al. 1999) and a third pulmonate, Euhadra herklotsi (Yamazaki et al. 1997), have been reported. The size of molluscan mtdna is from 14 kb [pulmonate gastropods (e.g., Terrett et al. 1996)] to approximately 34 kb [the scallop Placopecten magellanicus (e.g., La Roche et al. 1990)]. The gene arrangement of Mytilus mtdna is notably different from that of other known metazoan mtdnas in that atp8 is absent and an additional trna Met gene is present (Hoffmann et al. 1992). Katharina, Mytilus, and opisthobranch/pulmonate gastropods exhibit marked differences in their mt gene arrangements (see Kurabayashi and Ueshima 2000), and the pulmonate mt gene orders differ even among species belonging to the same family (Yamazaki et al. 1997; Kurabayashi and Ueshima 2000). For this reason, comparison of mt gene organization has become a popular means of inferring metazoan phylogeny (see Boore 1999). Although Cephalopoda is one of the major classes of Mollusca, no complete mt genome has been sequenced so far. We previously reported a 9240-bp mtdna fragment of the squid Loligo bleekeri (Sasuga et al. 1999) and found that the Loligo mt genome has a gene arrangement different from that of other mollusks such as Katharina. To characterize the cephalopod mt genome, we have sequenced the remaining region of the Loligo mt genome. The Loligo mt genome carries several long noncoding regions, which appear to be related to differences in mt gene arrangement between Loligo and other mollusks. Materials and Methods DNA Preparation, PCR, Cloning, and Sequencing Total DNA was prepared from livers of Loligo by the conventional phenol-extraction method (Sambrook et al. 1989). The region of Loligo mtdna that had not been determined previously was amplified by PCR (Saiki et al. 1988) as seven fragments, using the following primers designed according to the partial sequence of Loligo mtdna (Sasuga et al. 1999; unpublished results): fragment A, 5 -gggaattc TAAATTATTCACATAATTCTGCC-3 and 5 -gggaagcttg- GATCCTTGGTTTCATTCAT-3 ; fragment B, 5 -gggaattc AAATATACAATCATAGCAAGTC-3 and 5 -gggaagcttg- TATATCTTTATTTGATTATGGTT-3 ; fragment C, 5 gggaattcccgtaaaggaccttcac-3 and 5 -gggaagct- TGGGGAATCTGAACTTGTATCT-3 ; fragment D, 5 -TTCTT CGATCCTTTCGTA-3 and 5 -TTTATCAAAAACATCTCTCTTTG- 3 ; fragment E, 5 -gggctgcagacaaactaataaccaatac- CCTTA-3 and 5 -gggctgcagcagaccggcgtgagccag- GTTG-3 ; fragment F, 5 -gggggatccttatgctacctt AGTACAGTTAA-3 and 5 -gggctgcagggttgtaggaata TATAATAATAGATG-3 ; and fragment a, 5 -gggaattcttaac- TATTCTCTTAATTGGCCT-3 and 5 -gggctgcagggtgtttt- TAGTACGCCCCT-3. Underlined letters indicate restriction enzyme sites introduced for the convenience of ligation with the cloning vectors; lowercase letters denote additional 5 sequences inserted to ensure efficient digestion by the restriction enzymes. The relative locations of the PCR fragments are shown in Fig. 1 (upper panel). PCR was carried out as described by Saiki et al. (1988) in 50 L of a solution containing 10 mm Tris Cl, ph 8.4 (at 25 C), 2 mm MgCl 2, 400 M dntps, 25 pmol of each PCR primer, 2.5 U of Taq DNA polymerase, and 150 ng of total Loligo DNA. The mixtures were subjected to 30 cycles of PCR (one cycle: 94, 50 55, and 72 C for 1, 1, and 1.5 min, respectively). PCR-amplified fragments, purified on a QIAgen spin column (Qiagen) according to the manufacturer s protocol, were digested with restriction endonuclease and then ligated to puc18. Escherichia coli JM109 was transformed with the recombinant plasmids. DNA was sequenced using the dideoxy-termination method with Sequenase version 2.0 (Amersham). Synthetic oligonucleotide primers based on the newly obtained sequence were used for sequence extension. More than three independent clones were analyzed for each DNA clone. Data Analysis The nucleotide sequence of Loligo mtdna was analyzed using the GENETYX software package (Software Development Co. Ltd., Tokyo). trna genes were identified by the formation of cloverleaf secondary structures. Clustal X (Thompson et al. 1997) was used to align amino acid sequences inferred from the Loligo mtdna sequence with the counterparts of various metazoans. The complete sequence of Loligo mtdna is available through the DDBJ/EMBL/GenBank DNA databases under accession number AB Phylogenetic Analyses Based on Primary Sequences Amino acid sequences of mt protein genes were subjected to maximum likelihood (ML) analysis. Each protein gene was extracted from the following complete nucleotide sequences retrieved from the GenBank database: Metridium senile (accession number AF000023), Homo sapiens (J01415), Eumeces egregius (AB016606), Cyprinus carpio (X61010), Petromyzon marinus (U11880), Branchiostoma lanceolatum (Y16474), Balanoglossus carnosus (AF051097), Asterina pectinifera (D16387), Florometra serratissima (AF049132), Drosophila yakuba (X03240), D. melanogaster (U37541), Ceratitis capitata (AJ242872), Anopheles gambiae (L20934), A. quadrimaculatus (L04272), Locusta migratoria (X80245), Artemia franciscana (X69067), Daphnia pulex (AF117817), Ixodes hexagonus (AF081828), Rhipicephalus sanguineus (AF081829), Lumbricus terrestris (U24570), Platynereis dumerilii (AF178678), Katharina tunicata (U09810), Albinaria coerulea (X83390), and Terebratulina retusa (AJ245743). A sequence fragment of the gastropod Littorina saxatilis (LSA132137) was also retrieved. Together with the counterparts of the Loligo mt genome, all protein genes of interest were extracted and translated to amino acid sequences. Each protein gene was aligned using Clustal X (Thompson et al. 1997). After the alignments were slightly modified by hand, regions where the alignment was not satisfactory were removed. Two data sets were prepared. One was a combination of atp6, atp8, cox1, cox2, cob, nad1, and nad6 data (1052 sites in total); the other was a combination of all genes (2301 sites in total). The alignments used for the phylogenetic analyses are available from S.Y. on request. The phylogenetic analyses were carried out by the ML method using PROTML in MOLPHY 2.3b (Adachi and Hasegawa 1996). First, the ML distances between all pairs of taxa were estimated by PROTML using the distance (D) option with the mtrev-f model. Then, a neighbor-joining (NJ) tree was reconstructed by NJDIST in MOLPHY. The NJ trees were used as the start topologies for the local rearrangement

4 489 search (R) option of ML trees by PROTML with the mtrev-f model as the substitution model. To compare the substitution rates among molluscan sequences, the 1056-site data prepared for ML analysis were analyzed with RRTree (Robinson-Rechavi and Huchon 2000). Strand-Specific Bias AT skew and GC skew (Perna and Kocher 1995) were calculated at each codon position of protein genes from the Loligo (this study; Sasuga et al. 1999), Katharina (Boore and Brown 1994), Littorina (partial) (Wilding et al. 1999), Pupa (Kurabayashi and Ueshima 2000), Lumbricus (Boore and Brown 1995), and Terebratulina (Stechmann and Schlegel 1999) mt genomes. The genes atp6, atp8, cox3, and nad3 (atp6 and atp8 only for Littorina) were encoded by the same strand in each of the mt genomes listed above. Similarly, cox1, cox2, and nad2 [cox1 (partial) and nad2 for Littorina] were encoded by the same strand, and nad1, nad4l, nad4, nad6, and cob [nad1, nad6, and cob (partial) for Littorina] were also encoded by the same strand. Therefore, we categorized the protein genes into three groups: A, B, and C. For the first and second positions, all the codons except those used for initiation and termination were included in the analyses. For the third position, four degenerate codon boxes were used. The data set sizes (number of codons) were as follows: for Loligo, 664, 1083, and 1972 for groups A, B, and C, respectively; for Katharina, 657, 1076, and 1968; for Littorina, 281, 598, and 690; for Pupa, 647, 1030, and 1909; for Lumbricus, 656, 1072, and 1990; and for Terebratulina, 660, 1067, and The same data set used to analyze the AT skew and GC skew was used to calculate the frequencies of amino acids specified by GT-rich codons (Phe TTY, Leu TTR, Val GTN, Cys TGY, Trp TGR, and Gly GGN) vs those specified by AC-rich codons (Pro CCN, Thr ACN, His CAY, Gln CAR, Lys AAY, and Asn AAR). Results and Discussion Sequence and Gene Content of the Loligo mt Genome Table 1. Amino acid identities (%) of Loligo mt protein genes to those of Katharina, Littorina, Pupa, Terebratulina, and Lumbricus a Gene Katharina Littorina Pupa Terebratulina Lumbricus atp atp cox cox cox cob nad nad nad nad nad4l nad nad a Data are from the following sources: Katharina (Boore and Brown 1994), Littorina (Wilding et al. 1999), Pupa (Kurabayashi and Ueshima 2000), Terebratulina (Stechmann and Schlegel 1999), and Lumbricus (Boore and Brown 1995). The highest identity for each protein gene is indicated by boldface numbers, and the lowest by italic numbers. We newly determined the sequence of an 8148-bp fragment of Loligo mtdna. This sequence, together with that determined previously [9240 bp (Sasuga et al. 1999); there is a 177-bp overlap between these two fragments.], provided the complete sequence of Loligo mtdna, which has a total of 17,211 bp. The nucleotide composition of the complete Loligo mt genome is 38.8% A, 19.4% C, 9.2% G, and 32.5% T (A+T 71.3%, G+T 41.7%, AT skew 0.089, and GC skew 0.358) in the sense strand on which a majority of protein genes is encoded. In the Katharina mt genome, the nucleotide composition of the major coding strand is 31.4% A, 11.9% C, 18.6% G, and 38.1% T (A+T 69.5%, G+T 56.7%, AT skew 0.095, and GC skew 0.199). It was thus confirmed that the ratios of A and T and of G and C in the Loligo and Katharina mt genomes are inverted, as pointed out previously (Sasuga et al. 1999). In the newly sequenced fragment, six protein genes (cox3, nad3, cob, nad6, nadl, and the 5 half of nad2), two rrna genes, and eight trna genes [trnq, trni, trnk, trnp, trns(uga), trns(gcu), trnw, and trnv] were identified. Together with our previous results (Sasuga et al. 1999), Loligo mtdna is concluded to encode a standard set of metazoan mt genes 13 protein, 2 rrna, and 22 trna genes. The complete gene organization of Loligo mtdna is presented in Fig. 1 (lower panel). Protein Genes The genes cox3, cob, nad2, and nad6 start with an ATG codon, nad3 with an ATA codon, and nad1 with an ATT codon. All these protein genes have complete termination codons (TAA or TAG). The sizes of the newly identified Loligo mt protein genes are very similar to those of their counterparts in the Katharina, Littorina, and Pupa mt genomes (data not shown). As shown in Table 1, four of the five Littorina mt protein genes (atp6, atp8, cox2, and nad1) show the highest similarity at the amino acid sequence level to their Loligo counterparts. In the case of nad6, Katharina nad6 has more similarity to its Loligo counterpart than to that of Littorina. With regard to the remaining eight mt protein genes, six Katharina genes and two Lumbricus genes show the highest similarity to their Loligo counterparts. Although Littorina and Pupa belong to Gastropoda, 8 of the 13 Pupa mt protein genes show the lowest similarity to their Loligo counterparts. This suggests that the Pupa mt protein genes might have evolved more rapidly than those of the other species in Table 1 (Loligo, Katharina, Lumbricus, and Terebratulina). The possible evolutionary patterns of these mt protein genes are discussed in more detail later. rrna Genes Loligo mt rrnl and rrns appear to be 1334 and 978 bp long, respectively, which is similar to their Katharina counterparts (Boore and Brown 1994) but longer than

5 490 Fig. 2. Cloverleaf structures of eight Loligo mt trna genes found in the region sequenced in this study (see Fig. 1). those of pulmonate gastropods (e.g., Yamazaki et al. 1997). trna Genes The gene trns(uga) as well as trns(gcu) in the Loligo mt genome can be formed into a cloverleaf structure similar to trns(gcu) of other metazoan mitochondria (Fig. 2). Likewise, mt trns(uga) of Katharina (Boore and Brown 1994), the pulmonates Cepaea and Euhadra (Yamazaki et al. 1997), and the bivalves Mytilus edulis and M. californianus (Beagley et al. 1999) have been reported to lack a D stem, as is the case in nematode mt trns(uga) (Okimoto et al. 1992). However, the pulmonate Albinaria has a D stem in its mt trns(uga) (Hatzoglou et al. 1995); lack of the D stem in this gene is not a common feature among molluscan mt genomes. Loligo trns(uga) has a GG sequence in the D loop and a TTCGA sequence in the T loop; interaction between these two conserved sequences may stabilize the tertiary structure of the trna, as in the cases of typical trnas (Dirheimaer et al. 1995). Only 2 bp can be formed in the D stems of the trnq and trnk (Fig. 2). Three of the eight trna genes [trns(uga), trns(gcu), and trnq] have GG-conserved nucleotides in the D loop. The T stems consist of 5 bp in seven of the trna genes; only trni has a 4-bp T stem. Unlike several trnas in Cepaea and Euhadra mtdnas (Yamazaki et al. 1997), none of the Loligo trna genes have lost their T stem. Metazoan mitochondria use the modified wobble rule, as summarized by Yokobori et al. (2001). While most anticodon sequences of the 22 species of Loligo mt trna genes are sufficient for reading most of the codons according to the mt wobble rule, there are two exceptions related to trnas involved in the translation of nonuniversal genetic codes namely, the AUA codon specifying Met and the AGR codon specifying Ser (see Sasuga et al. 1999). We have found that the C at the anticodon wobble position of Loligo mt trna Met is modified to 5-formylcytidine (f 5 C) (Tomita et al. 1997). This f 5 C nucleotide modification has also been observed in bovine (Moriya et al. 1994), Ascaris (Watanabe et al. 1994), and D. melanogaster (Tomita et al. 1999) mt trnas Met,in which the AUA codon is read as Met. Thus, it is most likely that a modification from C to f 5 C is involved in the recognition of the AUA codon in Loligo mitochondria. We have found that guanosine at the anticodon wobble position of Loligo mt trna Ser GCU is modified to 7-methylguanosine (m 7 G) (Tomita et al. 1998). Modification to m 7 G also occurs at the first anticodon position of mt trna Ser GCU of the starfish Asterias amurensis (Matsuyama et al. 1998). In both cases, all of the AGN codons specify Ser, and it may be that all of the AGN codons are recognized by a single trna species (trna Ser GCU). On the other hand, trna Ser GCU decoding only AGYs has been reported to carry the unmodified G at the first codon position in several metazoans, including bovine (Ueda et al. 1985) and urochordate Halocynthia roretzi mitochondria (Kondow et al. 1999). Thus, it

6 491 is most likely that in Loligo mitochondria, the modification from G to m 7 G is responsible for decoding AGR in addition to AGY codons. Noncoding Region The Loligo mt genome has 19 noncoding regions (NC) (Figs. 1 and 3) longer than 10 bp, 3 of which are longer than 500 bp, namely, NC4 (515 bp), located between trnq and trni; NC8 (507 bp), between trnw and trnk; and NC16 (509 bp), between trng and trna (Fig. 1). These three long NCs have nearly identical sequences (Fig. 3A; the pairwise similarities of NC4/NC8, NC4/ NC16, and NC8/NC16 are 96.5, 97.8, and 95.7%, respectively). Therefore, it is likely that these three NCs originate from a single noncoding region. The differences among the sequences occur mainly at their 5 and 3 ends (Fig. 3A). For instance, NC4 contains (AT) 10 and NC8 contains (AT) 8 (three nucleotides overlap with trnk) at its 3 end (Fig. 3A), whereas NC16 does not contain any (AT) n sequence. In addition to NC4 and NC8, the noncoding regions between trnq and trna (NC10) and between trnl(uag) and cox3 (NC19) contain (AT) 14, and NC19 contains (AT) 12 (two nucleotides overlapping cox3 are included), as reported previously (Sasuga et al. 1999) (Fig. 3B). When the secondary structures of NC4/8/16 were predicted for both strands, several stem-and-loop structures could be formed at the 5 and 3 regions of the strand shown in Fig. 3A (data not shown). In the case of the opposite strand also, several large stem-and-loop structures could be formed. In NC4/8/16, the sequence 5 - ATATAACCATCCACACTCACCCTCCATAAAC-3 occurs twice (boxed in Fig. 3A). However, neither sequence was part of any of the stem-and-loop structures predicted for the strand shown in Fig. 3A, and they were at different positions in stem-and-loop structures for the opposite strand (data not shown). When a BLAST search (BLASTN) was performed for NC4/8/16 against the mitochondrial database in NCBI (Altschul et al. 1997), there were three regions that matched other mt sequences. Sequences similar to the central region (5 -ATAAACAAATAAATAAATA- CATAATA-3 ; underlined in Fig. 3A) were found in the noncoding regions of various mt genomes, such as that of Saccharomyces cerveciae (Foury et al. 1998) (data not shown). For the 5 and 3 regions, several hits were obtained, most of which were noncoding regions. However, no molluscan mtdna sequences hit the NC4 sequence. From these results, it is difficult to visualize the significance of the similarities between the NC4 sequence and other mtdna sequences. It may be that the noncoding sequences determined and the predicted possible secondary structures play some role(s) in the early stages of the replication and transcription process. However, further experiments are needed to clarify this speculation. Phylogenetic Analyses Based on Primary Sequences When combined data on the inferred amino acid sequences of the 13 mt protein genes are used for the ML analysis (Adachi and Hasegawa 1996), Loligo forms a group with Katharina (Fig. 4A). However, the local bootstrap probability (LBP) support for the group is only 62%. In addition, the position of gastropods (Albinaria and Pupa) is far from that of Loligo/Katharina in the ML tree. Quartet puzzling (QP) analysis (Strimmer and von Haeseler 1996), which permits rate variation among sites, did no resolve the relationship among the Loligo/ Katharina group, gastropods (Albinaria and Pupa), annelids (Lumbricus and Platynereis), and a brachiopod (Terebratulina) (data not shown). When the gastropod Littorina (1052 residues) is included in the ML analysis, Littorina and Katharina form a group (Fig. 4B) with an LBP of 83%. In the ML tree and the QP tree (data not shown), the relationship among Katharina/Littorina, Loligo, and Terebratulina is not resolved. In addition, the gastropods (Littorina, Albinaria, and Pupa) do not form a monophyletic group in these trees. These results are essentially the same as those of Stechmann and Schlegel (1999), in which cephalopod data are not included. Thus, the monophyly of Mollusca (Polyplacophora, Gastropoda, and Cephalopoda in these analyses) as well as the phylogenetic position of Cephalopoda within Mollusca could not be satisfied, as was the case with 18S rrna analyses (e.g., Winnepenninckx et al. 1996), although the monophyly of Mollusca is widely accepted from morphological studies (e.g., Nielsen 1995; Willmer 1990). A close relationship among Annelida, Brachiopoda, and Mollusca has been suggested by 18S rrna analysis (e.g., Cohen 2000). A close relationship between Annelida and Mollusca has also been suggested by analysis of partial elongation factor 1 (EF-1 ) sequences (Kojima et al. 1993), as well as 18S rrna analyses (e.g., Aguinaldo et al. 1997). In addition, an Annelida Mollusca grouping rather than an Annelida Arthropoda grouping has been suggested by various analyses, based not only on molecular phylogenetics but also on morphological studies (e.g., Eernisse et al. 1992). Why is the monophyly of Mollusca not supported in the above analyses, even though the monophyletic origin of Mollusca is widely accepted (e.g., Nielsen 1995; Willmer 1990)? As noted earlier, either Katharina or Littorina protein genes exhibit the highest similarity to their Loligo counterparts except for cox3 and nad3; in these two cases, the Lumbricus genes show the highest similarity. Eight of the 13 Pupa mt protein genes have

7 492 Fig. 3. Comparison of the long noncoding sequences in Loligo mtdna. A NC4, NC8, and NC16. Each location is shown in Fig. 1. Positions where the three noncoding sequences have the same nucleotide are denoted by asterisks. Repeated regions are boxed. The sequence 5 -ATAAACAAATAAATAAATACATAATA-3 mentioned in the text is underlined. B Noncoding sequences other than NC4, NC8, and NC16. For positions of noncoding regions, see Fig. 1.

8 493 Fig. 4. ML trees of protostomes based on the inferred amino acid sequences of mt protein genes. A Without Littorina. B With Littorina. See Materials and Methods for details. the lowest identity among the species compared in Table 1. Such heterogeny of the rate of evolution among molluscan mt genome sequences might have led to the construction of an incorrect tree; the Albinaria and Pupa sequences might have evolved more rapidly than the Katharina, Loligo, and Littorina sequences (Fig. 4). To address this issue, the aligned sequence data used for the ML analysis presented in Fig. 4B were used for a relative rate test (Robinson-Rechavi and Huchon 2000). When D. yakuba is treated as the outgroup, there are no significant differences in the substitution rates among the molluscan species (data not shown). However, D. yakuba may be too distal an outgroup for this analysis. If the monophyly of Mollusca is accepted, although the ML trees in Fig. 4 do not support this, Annelida might be a much better outgroup for Mollusca than D. yakuba. When the annelid species (Lumbricus and Platynereis) are used as the outgroup, Albinaria apparently shows a significantly higher substitution rate than Katharina (p << 0.01, where the null hypothesis is that the two species compared show the same substitution rate), Loligo (p << 0.01), and Littorina (p 0.010). In addition, Pupa also exhibits a significantly higher substitution rate than Katharina (p 0.043) and Littorina (p 0.014), although the difference in substitution rates between Pupa and Loligo is not significant (p 0.232). Thus, the substitution rates of the molluscan species compared here vary, the rates of Pupa and Albinaria being higher than those of the other mollusks. This could affect the shape of the recovered phylogenetic trees. Another factor, related to the tempo of evolutionary change, which could have led to the construction of an

9 494 incorrect tree is the differences in nucleotide composition among molluscan and other protostome mt genes. The second position of a codon is the most conserved of the three codon positions, since changing the nucleotide at the second position changes the property of the encoded amino acid. Although AT and GC skews are in most cases negative at the second codon position, the GC skew of Katharina in groups A (consisting of atp6, atp8, cox3, and nad3) and B (consisting of cox1 and cox2, and nad2) and the GC skew of Loligo in group C (consisting of nad1, nad4 nad6, and nad4l) are positive (Fig. 5A). The AT skews of Katharina at the first and third positions of the codons in groups A and B are negative as in the cases of the second codon positions, and the GC skews are positive as in the cases of the second codon positions. For the same gene sets (groups A and B), the AT skew is very small and the GC skew is negative at the first and third positions of Loligo codons. On the other hand, in the case of group C the AT skew is very small and GC skew is negative at the first and third positions of Katharina codons, but the AT skews of Loligo at the first and third positions of the codons in the groups are negative as in the cases of the second codon positions, and the GC skews are positive as in the cases of the second codon positions. Thus, it can be concluded that the AT and GC skews are inverted, which means that the bias is inverted, between the same genes of Loligo and Katharina. The different direction of bias found in Loligo and Katharina mt protein genes affects the amino acid composition of the resultant polypeptides (Fig. 5B). The frequencies of GT amino acids (those encoded by GGN, GTN, TGN, and TTN codons) and AC amino acids (those encoded by AAN, ACN, CAN, and CCN codons) are very different between groups A/B and group C in the Loligo and Katharina mt genomes, respectively. Furthermore, the bias of GT-amino acid richness/shortness and that of AC-amino acid richness/shortness are opposite if they are compared between the counterparts of the Loligo and Katharina mt genomes. Thus, nucleotide usage in Katharina and Loligo mt protein genes is governed not only by the gene type but also by another constraint. This is most likely to be strand-specific directional mutation pressure (Asakawa et al. 1991) operating on the genes. All three codon positions appear to have been affected by this directional mutation pressure, which would have given rise to the different amino acid compositions of the Loligo and Katharina mt protein genes. All the mt protein genes in Littorina, Terebratulina, and Lumbricus are encoded by a single strand; hence the AC/GT bias of these genomes is likely to change the nucleotide and amino acid compositions in same direction for all the genes (Figs. 5A and B). On the other hand, the AC/GT bias is not an apparent constraint of amino acid usage in the cases of the Pupa mt protein genes (Figs. 5A and B). The branching orders of vertebrate species in the ML trees presented in Fig. 4 differ from the widely accepted view. An unusual branching order of vertebrate species in phylogenetic analysis using mt sequences has been noted when lampreys and nonvertebrate metazoan species, such as echinoderms, are included (e.g., Takezaki and Gojobori 1999). This might be also affected, in part, by differences in amino acid composition between vertebrates and other species (see Takezaki and Gojobori 1999), as postulated for molluscan species (see above). Evolution of Loligo mt Gene Arrangement If only protein and rrna genes are considered, only one inversion event would explain the difference in the mt gene order between Katharina and the brachiopod Terebratulina, which appears as a species very closely related to mollusks in the phylogenetic tree based on mt protein gene sequences, as discussed above (Fig. 6). Similarly, the difference in protein/rrna mt gene arrangement between Katharina and Littorina (reported region) is explained by one inversion (Fig. 6). On the other hand, the difference in the arrangement between the Katharina and the Loligo mt genomes is a more complex issue, since five gene blocks are recognized and are arranged in different orders in the Loligo and Katharina mt genomes (Fig. 6). Two transpositions and one inversion are necessary to explain the difference in gene organization between the Loligo and the Littorina mt genomes (Fig. 6). On the other hand, rearrangement of five gene blocks and two inversions are necessary to explain the difference in gene organization between the Loligo and the Terebratulina mt genomes (Fig. 6). These findings suggest that the Loligo mt genome has a highly scrambled gene arrangement compared with those of the Katharina, Littorina, and Terebratulina mt genomes. trna genes are known to transpose more frequently than protein and rrna genes in metazoan mt genomes (e.g., Pääbo et al. 1991). In addition, trna genes are often found at the end of the deleted/duplicated region, suggesting that they may be hot spots for gene rearrangement events (e.g., Stanton et al. 1994). When trna genes are also included in a comparison of the gene arrangements, the locations of 7 of the 22 trna genes in the Loligo mt genome can be directly compared with the locations of their counterparts in Katharina (Boore and Brown 1994) and Littorina (Wilding et al. 1999) (Fig. 7A). (Genes encoded by the opposite strand are underlined.) The order, rrns trnv rrnl, is shared by these mt genomes as well as by various nonmolluscan mt genomes (i.e., most vertebrate and arthropod mt genomes, as well as annelid and Terebratulina mt genomes). The Loligo mt genome shares the orders trns(gcu) nad2, trnt nad4l, nad5 trnf, and cob trns(uga) with the Katharina mt genome (regions containing these genes are not known for Littorina); these orders are also found in

10 495 Fig. 5. A Comparison of AT skew and GC skew (Perna and Kocher 1995) among three codon positions, among three groups, and among the Loligo (this study; Sasuga et al. 1999), Katharina (Boore and Brown 1994), Littorina [partial (Wilding et al. 1999)], Pupa (Kurabayashi and Ueshima 2000), Lumbricus (Boore and Brown 1995), and Terebratulina (Stechmann and Schlegel 1999) mt genomes. The mt protein genes are divided into three groups (A, B, and C). Black and White bars indicate AT skews and GC skews, respectively. For details, see Materials and Methods. B Comparison of frequencies of GT amino acids and AC amino acids among Loligo, Katharina, Littorina (partial), Pupa, Lumbricus, and Terebratulina. The same data set as that for A was used for analysis. Black and white bars indicate the appearance percentages of the GT amino acids and the AC amino acids, respectively.

12 497 Fig. 7. A Comparison of gene arrangement among the Loligo, Katharina (Boore and Brown 1994), and Littorina (Wilding et al. 1999) mt genomes. The conserved gene orders (at least, the order of the gene pair) between two genomes are indicated by bars. 4L, nad4l. Genes encoded on the opposite strand are shown in gray boxes. B Comparison of the partial orders of noncoding regions and flanking trna genes in the Loligo mt genome and those of the and trna gene clusters (see text), trnl(uaa), trnl(uag), and trnh in the Katharina mt genome (Boore and Brown 1994). Lengths of noncoding regions (NC4, etc.) are shown.

13 498 most arthropod mt genomes [trns(gcu) nad2, trnt nad4l, nad5 trnf, and cob trns(uga)], annelid mt genomes [trns(gcu) nad2 and nad5 trnf], and the Terebratulina mt genome [trns(gcu) nad2 and nad5 trnf]. Therefore, these orders might be ancestral features of molluscan mtdna. The Loligo and Littorina mt genomes share the positions of two trna genes, trnd atp8 and trnp nad6, which are not shared by the Katharina mt genome (Fig. 7A). The order trnd atp8 is also found in the Lumbricus (Boore and Brown 1995) and Terebratulina (Stechmann and Schlegel 1999) mt genomes as well in as most arthropod mt genomes. This suggests that the order trnd atp8 might be the ancestral gene order for molluscan mt genomes. On the other hand, the direction of trnp in the Katharina mt genome (trnp nad6) is inverted in relation to that in the Loligo and Littorina mt genomes (trnp nad6) (Fig. 7A). The arthropod (such as Drosophila) mt genome has the order trnp nad6 (Clary and Wolstenholme 1985), which is similar to the case for the Katharina mt genome. Therefore, the order trnp nad6 found in the Loligo and Littorina mt genomes may be a synapomorphic trait and may be derived from the order trnp nad6 found in the Katharina and Drosophila mt genomes. It is notable that the order trnp nad6 is found in the opisthobranch gastropod Pupa mt genome (Kurabayashi and Ueshima 2000), whose gene arrangement is much different from those of other molluscan mt genomes. trna Genes Flanking Noncoding Regions in the Loligo mt Genome Fifteen of 22 trna genes in the Loligo mt genome are positioned differently from their counterparts in the Katharina/Littorina mt genomes. Twelve of these 15 Loligo mt trna genes are found in two trna gene clusters in the Katharina mt genome (Fig. 7B). One of these is the cluster between rrns and cox3 (trnm trnc trny trnw trnq trng trne) (hereafter referred to as the cluster); the other is between cox3 and nad3 (trnk trna trnr trnn trni) (referred to as the cluster) (Boore and Brown 1994). A short (141-bp long) noncoding region is found between trne and cox3 in the Katharina mt genome, containing an AT stretch (34 repeats of TA dinucleotides) (Boore and Brown 1994). As shown in Fig. 7B, one of the ends of the noncoding regions NC4, NC8, NC10, and NC12 is flanked by the trna gene found in the cluster, and the other is flanked by the trna gene found in the cluster. Furthermore, the relative directions of the trna gene originating from the cluster and that originating from the cluster are the same as the original relative directions of the and the clusters. Because they are highly similar, NC4, NC8, and NC16 are likely to have originated from a single noncoding region. Hence, multiplication of the noncoding regions would correlate with the transposition of the trna genes flanking them (discussed later). The genes trnl(uaa) and trnl(uag), located between nad1 and rrnl in the Katharina and Littorina mt genomes, are replaced by the block NC3 trnq NC4 trni NC5 in the Loligo mt genome (Figs. 7A and B). trnl (uaa) is located downstream of trng (Figs. 7A and B), and trnl(uag) flanks NC19 together with trnh (Figs. 7A and B), as reported previously (Sasuga et al. 1999). As noted above, the Katharina mt gene arrangement might retain more ancestral features than the Loligo mt gene arrangement (see Fig. 6). Let us consider that the order trnm trnc trny trnw trnq trng trne NC cox3 trnk trna trnr trnn trni in the Katharina mt genome was rearranged to trnm trnc trny trnw trnq trng trne NC trnk trna trnr trnn trni cox3 in the ancestral Loligo mt genome. If this were the case, the noncoding region might be the ancestor of NC4, NC8, and NC16 of the Loligo mt genome as well as of other shorter noncoding regions (NC10, NC12, and probably NC19). Sequential multiplication of the noncoding region with the flanking trna gene clusters might have occurred in the ancestral Loligo mt genome. After or during multiplication of the noncoding regions with the flanking trna genes, loss of some trna genes in each copy might have occurred and trna Leu genes might have been transposed with the set of the noncoding region and the flanking trna genes after insertion of the ancestral NC4 and its flanking trna genes near the trna Leu genes between nad1 and rrnl. trna genes have been considered as hot spots located at the ends of duplicated fragments in various mt genomes (e.g., Stanton et al. 1994). In addition, the frequent occurrence of gene rearrangement around noncoding regions that contain the origins and/or regulation elements for replication and/or transcription has been found in various mt genomes (e.g., Zevering et al. 1991). Among the six long noncoding regions with flanking trna genes (NC4, -8, -10, -12, -16, and -19), four (NC8, -12, -16, and -19) are located at the junctions of blocks for protein and rrna genes conserved between the Loligo and the Katharina mt genomes. This suggests that the rearrangement of gene blocks mentioned in Fig. 6 might somehow correlate with the spread of noncoding regions. NC4 with flanking trnq and trni is located downstream of rrnl. In vertebrate mt genomes, trnl(uaa) located downstream of rrnl is known to contain an element for the termination of transcription, so that more rrnas than mrnas are transcribed (Attardi 1985). Therefore, the region trnq NC4 trni may play the same role. Alternatively, if both NC4 and NC8 contain initiation points for transcription, and those in NC4 are controlled differently than those in NC8, control of amounts of rrna transcripts relative to those of mrnas (an excess of rrnas) would be realized. Because of the high degrees of similarity among NC4, NC8, and NC16, some stages in the multiplication of the

Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that

MCB 102 University of California, Berkeley August 11, 2009 Isabelle Philipp Online Document Problem Set 8 Answer Key 1. The Genetic Code (a) Are all amino acids encoded by the same number of codons? no

1. Transcription Gene Expression The expression of a gene into a protein occurs by: 1) Transcription of a gene into RNA produces an RNA copy of the coding region of a gene the RNA transcript may be the

CH 17 :From Gene to Protein Defining a gene gene gene Defining a gene is problematic because one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and there

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Messenger RNA Carries Information for Protein Synthesis from the DNA to Ribosomes Ribosomes Consist

12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each

Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein This is also known as: The central dogma of molecular biology Protein Proteins are made

PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other

Chapter 12 DNA TRANSCRIPTION and TRANSLATION 12-3 RNA and Protein Synthesis WARM UP What are proteins? Where do they come from? From DNA to RNA to Protein DNA in our cells carry the instructions for making

Problem Set Unit 3 Name 1. Which molecule is found in both DNA and RNA? A. Ribose B. Uracil C. Phosphate D. Amino acid 2. Which molecules form the nucleotide marked in the diagram? A. phosphate, deoxyribose

CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

ENZYMES AND METABOLIC PATHWAYS This document is licensed under the Attribution-NonCommercial-ShareAlike 2.5 Italy license, available at http://creativecommons.org/licenses/by-nc-sa/2.5/it/ 1. Enzymes build

Ch 17 Practice Questions MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Garrod hypothesized that "inborn errors of metabolism" such as alkaptonuria

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! Protein Synthesis/Gene Expression Why do we need to make proteins? To build parts for our body as

Chapter 13 From DNA to Protein Proteins All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequenceof a gene The Path From Genes to

Gene Expression Transcription/Translation Protein Synthesis 1. Describe how genetic information is transcribed into sequences of bases in RNA molecules and is finally translated into sequences of amino

From Gene to Protein How Genes Work (Ch. 17) What do genes code for? How does DNA code for cells & bodies? how are cells and bodies made from the instructions in DNA DNA proteins cells bodies The Central

From Gene to Protein Chapter 17 Objectives Describe the contributions made by Garrod, Beadle, and Tatum to our understanding of the relationship between genes and enzymes Briefly explain how information

EXAMPLE QUESTIONS AND ANSWERS 1. Topoisomerase does which one of the following? (a) Makes new DNA strands. (b) Unties knots in DNA molecules. (c) Joins the ends of double-stranded DNA molecules. (d) Is

Nucleic Acids Nucleic acids are molecules that store information for cellular growth and reproduction There are two types of nucleic acids: - deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) These

History of recombinant DNA technology Recombinant DNA Technology (DNA cloning) Majid Mojarrad Recombinant DNA technology is one of the recent advances in biotechnology, which was developed by two scientists

PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other

Protein Synthesis DNA to RNA to Protein From Genes to Proteins Processing the information contained in DNA into proteins involves a sequence of events known as gene expression and results in protein synthesis.

Chapter 7 (Strickberger) FROM MOLECULES TO LIFE Organisms depended on processes that transformed materials available outside of the cell into metabolic products necessary for cellular life. These processes

Basic Concepts of Human Genetics The genetic information of an individual is contained in 23 pairs of chromosomes. Every human cell contains the 23 pair of chromosomes. One pair is called sex chromosomes

Biology I D N A DNA contains genes, sequences of nucleotide bases These Genes code for polypeptides (proteins) Proteins are used to build cells and do much of the work inside cells DNA Begins the Process

Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information

Chapter 11 Gene Expression and Regulation Lectures by Gregory Ahearn University of North Florida Copyright 2009 Pearson Education, Inc.. 11.1 How Is The Information In DNA Used In A Cell? Most genes contain

Lecture Four. Molecular Approaches I: Nucleic Acids I. Recombinant DNA and Gene Cloning Recombinant DNA is DNA that has been created artificially. DNA from two or more sources is incorporated into a single

Name: AP Biology Mr. Croft Chapter 14 Active Reading Guide From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single

DNA Function: Information Transmission DNA is called the code of life. What does it code for? *the information ( code ) to make proteins! Why are proteins so important? Nearly every function of a living

3 Designing Primers for Site-Directed Mutagenesis 3.1 Learning Objectives During the next two labs you will learn the basics of site-directed mutagenesis: you will design primers for the mutants you designed

Exam Chapter 17 Genes to Proteins Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. The following questions refer to Figure 17.1, a simple metabolic

Midterm Q Genes found in the genome include protein-coding genes and non-coding RNA genes Which nucleotide is not normally found in non-coding RNA genes? G T 3 A 4 C 5 U 00% Midterm Q Which of the following

From Gene to Protein I. Transcription and translation are the two main processes linking gene to protein. A. RNA is chemically similar to DNA, except that it contains ribose as its sugar and substitutes

Your unique body characteristics (traits), such as hair color or blood type, are determined by the proteins your body produces. Proteins are the building blocks of life - in fact, about 45% of the human

Class: Date: AP2013-DNAPacket-II Multiple Choice Identify the choice that best completes the statement or answers the question. Use the list of choices below for the following questions: I. helicase II.

COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

Chapter 14: Gene Expression: From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect

Protein Synthesis Presented by Dr. Mohammad Saadeh The requirements for the Pharmaceutical Biochemistry I Philadelphia University Faculty of pharmacy STRUCTURE OF RNA RNA, adenine forms a base pair with

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome

Gene expression Review of Protein (one or more polypeptide) A polypeptide is a long chain of.. In a protein, the sequence of amino acid determines its which determines the protein s A protein with an enzymatic

FROM DNA TO PROTEINS: gene expression Chapter 14 LECTURE OBJECTIVES What Is the Evidence that Genes Code for Proteins? How Does Information Flow from Genes to Proteins? How Is the Information Content in

Name Period This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect to spend at least 6 hours to truly master

Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of

Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves

OpenStax-CNX module: m46032 1 Protein Synthesis OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 By the end of this section, you will

Protein Synthesis What do we know? Metabolism is controlled by enzymes enzymes are proteins DNA contains the genetic information to build proteins. DNA is only in the nucleus. Ribosomes are not. How then

Protein Synthesis Notes Protein Synthesis: Overview Transcription: synthesis of mrna under the direction of DNA. Translation: actual synthesis of a polypeptide under the direction of mrna. Transcription

CHAPTER 17 FROM GENE TO PROTEIN Section C: The Synthesis of Protein 1. Translation is the RNA-directed synthesis of a polypeptide: a closer look 2. Signal peptides target some eukaryotic polypeptides to

LECTURE 26 DNA REPAIR A. The capability for repair of damaged DNA is found in one form or another in all organisms. Prokaryotes (e.g., E. coli) have five repair systems, whereas higher organisms (e.g.,

CHAPTER 9 DNA Technologies Recombinant DNA Artificially created DNA that combines sequences that do not occur together in the nature Basis of much of the modern molecular biology Molecular cloning of genes

14 Genomes and Genomics WORKING WITH THE FIGURES 1. Based on Figure 14-2, why must the DNA fragments sequenced overlap in order to obtain a genome sequence? Answer: Sequence overlap is required to align

THE GENETIC CODE As DNA is a genetic material, it carries genetic information from cell to cell and from generation to generation. There are only four bases in DNA and twenty amino acids in protein, so