Abstract

The unicellular green alga Chlamydomonas reinhardtii is a particularly important model organism for the study of photosynthesis since this alga can grow heterotrophically, and mutants in photosynthesis are therefore conditional rather than lethal. The recently developed tools for genomic analyses of this organism have allowed us to identify most of the genes required for chlorophyll and carotenoid biosynthesis and to examine their phylogenetic relationships with homologous genes from vascular plants, other algae, and cyanobacteria. Comparative genome analyses revealed some intriguing features associated with pigment biosynthesis in C. reinhardtii; in some cases, there are additional conserved domains in the algal and plant but not the cyanobacterial proteins that may directly influence their activity, assembly, or regulation. For some steps in the chlorophyll biosynthetic pathway, we found multiple gene copies encoding putative isozymes. Phylogenetic studies, theoretical evaluation of gene expression through analysis of expressed sequence tag data and codon bias of each gene, enabled us to generate hypotheses concerning the function and regulation of the individual genes, and to propose targets for future research. We have also used quantitative polymerase chain reaction to examine the effect of low fluence light on the level of mRNA accumulation encoding key proteins of the biosynthetic pathways and examined differential expression of those genes encoding isozymes that function in the pathways. This work is directing us toward the exploration of the role of specific photoreceptors in the biosynthesis of pigments and the coordination of pigment biosynthesis with the synthesis of proteins of the photosynthetic apparatus.

Over the past several decades, the unicellular green alga Chlamydomonas reinhardtii has been an outstanding system for dissecting the function of various proteins involved in photosynthesis (Grossman, 2000; Harris, 2001; Rochaix, 2002). The ability of this alga to grow heterotrophically in the dark by metabolizing exogenous acetate has made it relatively easy to isolate a broad range of C. reinhardtii mutants that adversely affect photosynthetic function (Levine, 1969). Mutants defective for photosynthesis are readily analyzed at the genetic level as this organism has a relatively simple and short life cycle (Quarmby, 1994). Furthermore, a variety of physiological, biochemical, genetic, and molecular tools have been applied to studies of C. reinhardtii, making it an ideal model system for elucidating biological processes (for review, see Grossman, 2000; Harris, 2001; Rochaix, 2002; Grossman et al., 2004).

Areas of interest with respect to light utilization in plants have focused on the involvement of pigments in both photosynthetic processes and the sensing and control of cellular processes through environmental light signals. Chlorophyll (Chl) and carotenoids are ubiquitous among photosynthetic organisms and play important roles in the function of the photosynthetic apparatus, the management of excitation energy and integration of photosynthetic function, and biogenesis of the photosynthetic membranes with the regulation of other cellular processes. Both Chl and carotenoid molecules bind to proteins integral to the photosynthetic machinery, where they absorb light energy to generate chemical bond energy (in the form of sugars) and also function in efficiently managing the use of excitation energy. Carotenoids also participate in redox reactions (Tracewell et al., 2001; Frank and Brudvig, 2004), the protection of organisms from photodamage by quenching singlet oxygen and triplet Chl species (Siefermann-Harms, 1987; Frank and Cogdell, 1993; Yamamoto and Bassi, 1996; Formaggio et al., 2001; Baroli et al., 2003), and the dissipation of excess absorbed light energy via interactions with singlet excited Chl molecules (Demmig-Adams, 1990; Demmig-Adams et al., 1996; Yamamoto and Bassi, 1996; Niyogi, 1999; Baroli and Niyogi, 2000; Pogson and Rissler, 2000; Ma et al., 2003). Carotenoids may even help stabilize membrane structure (Havaux and Niyogi, 1999). Interestingly, intermediates in the Chl biosynthetic pathway may serve as signaling molecules that communicate the status of the pathway to the transcriptional machinery in the nucleus of the cell, thereby regulating levels of proteins that require Chl for their function (such as light-harvesting Chl-binding proteins; Johanningmeier and Howell, 1984; Johanningmeier, 1988; Kropat et al., 1997; Strand et al., 2003), and it appears that the biosynthesis of Chl is intimately linked to the presence and/or synthesis of the light-harvesting complex (LHC) polypeptides (Xu et al., 2001). It is likely that Chl and carotenoid biosynthesis are precisely controlled to meet the demands of growing cells under a range of light conditions, and because intermediates in the former pathway are unstable and photoreactive, the accumulation of some intermediates in Chl biosynthesis can elicit the formation of damaging, reactive oxygen species. Although the synthesis of both Chl and carotenoids occurs within chloroplasts, in vascular plants all of the enzymes of the pathway are encoded by nuclear genes and are synthesized in the cytoplasm of the cell as precursor polypeptides with amino-terminal extensions (transit peptides) that enable them to pass through the double membrane of the chloroplast envelope and to their site of function within the organelle.

Chl is a cyclic tetrapyrrole coordinated by a central Mg2+ ion. The synthesis of Chl in plants and algae proceeds along the C5 pathway, in which the first dedicated precursor of the pathway, 5-aminolevulinic acid (ALA), is synthesized from a Glu molecule (Fig. 1). Two molecules of ALA are then condensed to form porphobilinogen, and four porphobilinogen molecules are joined to form the first linear tetrapyrrole of the pathway, hydroxymethylbilane. The hydroxymethylbilane is then cyclized, followed by a decarboxylation and oxidation reactions to form protoporphyrin IX. Mg2+ is inserted into the protoporphyrin IX molecule, and the resulting Mg2+ protoporphyrin IX molecule is methylated, followed by a cyclization reaction that forms the cyclopentanone ring and sequential reduction steps to form chlorophyllide a. The reduction of protochlorophyllide to chlorophyllide can be catalyzed by two different enzymes, the nucleus-encoded, strictly light-dependent protochlorophyllide oxidoreductase (LPOR), common to all photosynthetic eukaryotes and cyanobacteria, or a light-independent (dark-active) enzyme complex (DPOR) that is not present in angiosperms. The latter is comprised of three subunits (ChlB, ChlL, and ChlN) that are encoded by the plastid genome. Phytylation of chlorophyllide a yields Chl a, while oxidation of chlorophyllide a could yield chlorophyllide b followed by phytylation to form Chl b. This pathway and its regulation have been reviewed recently (Reinbothe et al., 1996; Suzuki et al., 1997; Beale, 1999; Vavilin and Vermaas, 2002; Cornah et al., 2003; Grossman et al., 2004).

The carotenoids are isoprenoids that belong to the tetraterpenoid group. Their basic structure is a C40 backbone containing a network of conjugated double bonds that form an extended π-electron system; this accounts for the ability of these molecules to absorb in both the UV and visible region of the light spectrum. Carotenoids that consist exclusively of hydrogen and carbon atoms are collectively termed carotenes. However, most naturally occurring carotenoids are oxygenated at one or more positions, placing them into the xanthophyll subgroup, which has been associated with managing the utilization of light energy in plants and algae (Demmig-Adams, 1990; Niyogi, 1999).

The biosynthesis of carotenoids (Fig. 2) starts with isopentenyl-diphosphate formation, the general precursor of all isoprenoids. In vascular plants and green algae, isopentenyl-diphosphate used for carotenogenesis is synthesized exclusively in the plastid by the recently discovered methylerythritol phosphate (MEP) pathway (Lichtenthaler, 1999; Rodriguez-Concepcion and Boronat, 2002; Rohmer, 2003). The first carotenoid, phytoene, results from the bonding of two C20 molecules, each derived from the condensation of four C5-isoprenoid units, to build the symmetrical C40 backbone. This is followed by extension of the π-electron system through sequential desaturation steps and cyclization of the ends of the molecule to generate carotenes. Finally, the introduction of oxygen groups onto the molecule generates xanthophylls. Details of carotenoid biosynthesis have been the subject of several recent reviews (Cunningham and Gantt, 1998; Hirschberg, 2001; Grossman et al., 2004).

Whole-genome information is being generated for a number of photosynthetic eukaryotes (Arabidopsis Genome Initiative, 2001; Yu et al., 2002; Goff et al., 2002; Armbrust et al., 2004; Matsuzaki et al., 2004), and a nearly completed genome sequence (http://genome.jgi-psf.org/chlre2) as well as a wealth of cDNA information are available for C. reinhardtii. In this article, we exploit this genomic information to define the different genes encoding enzymes involved in Chl and carotenoid biosynthesis in C. reinhardtii, focusing on the relationship of the predicted protein sequences of this alga to those of Arabidopsis (Arabidopsis thaliana) and Synechocystis PCC 6803. The analyses are specifically restricted to nuclear genes that encode proteins with enzymatic activity in the biosynthetic pathways leading to the formation of Chl and carotenoids. We have learned about the structure of these genes and aspects of protein function based on comparisons of the deduced amino acid sequences, analyzed the encoded proteins for the presence of organellar targeting presequences, and identified different potential isozymes associated with specific reactions in the biosynthetic pathways. In addition, we have examined codon usage of the different genes, accumulation of the mRNAs derived from the different isogenes, and the influence of light on their expression levels. These analyses have enabled us to generate hypotheses concerning the function and regulation of proteins involved in the biosynthesis of both Chl and carotenoids.

RESULTS AND DISCUSSION

General Comparison of Chl and Carotenoid Biosynthetic Genes from C. reinhardtii with Similar Genes from Arabidopsis and Synechocystis PCC 6803

The genes predicted to encode most of the polypeptides known to be directly involved in the biosynthesis of Chl and carotenoids in vascular plants were identified in the current version (assembly v2.0) of the C. reinhardtii genome and GenBank expressed sequence tag (EST) entries as of August 2004. Features of these genes have been compiled and are summarized in Table I (Chl genes) and Table II (carotenoid genes). These tables are intended to provide readers with a summary of the information with respect to genes encoding the enzymes of the Chl and carotenoid biosynthetic pathways and to serve as a resource to use for more in-depth analyses/experimentation. As indicated in the tables, some genes still contain gaps and/or have only partial cDNA coverage. Also, a number of gene models predicted from analysis of the genomic DNA sequence are incorrect, partly a consequence of sequence gaps, but also caused by noncanonical intron borders; often the correct mature transcript sequence can be inferred from available cDNA information. All gene models that we recognized as flawed are italicized in Tables I and II. Specific information on incorrect model prediction is included in the manual annotation of the respective gene models on the Joint Genome Institute (JGI) genome browser (http://genome.jgi-psf.org/chlre2/chlre2.home.html). Furthermore, we have performed additional cDNA sequencing for some of these genes to clarify or add needed sequence information (Tables I and II; see also “Materials and Methods”).

C. reinhardtii genes encoding putative Chl biosynthetic enzymes (see Fig. 1 for full names of gene products) were analyzed for completeness and cDNA coverage, results being indicated by the following abbreviations: N, not available; P, partial; C, complete; C, complete cDNA sequences which were generated for this publication (see “Materials and Methods” for accession numbers); n.h., no homolog; n.i., not yet identified; and n.p., not present. Gene models and sequence lengths recognized as incomplete or erroneous are italicized, as are those data that are biased by this circumstance and therefore are preliminary; gene models and additional information can be found at http://genome.jgi-psf.org/chlre2/chlre2.home.html (use model number as search term under “Advanced Search”). For comparative analyses, homologous protein sequences of C. reinhardtii, Arabidopsis, and Synechocystis PCC 6803 were aligned with ClustalW, and the alignments corrected where necessary. Length of putative presequences was determined as N-terminal extension of C. reinhardtii proteins as compared to cyanobacterial homologs. Then, nonconserved ends at the N terminus (presequences) and C terminus of proteins were clipped to yield alignments containing only the putative functional cores. From these truncated alignments, the percentage of positions with identical (ident.) or identical plus similar (simil.) amino acid positions were calculated using the BioEdit software. Targeting prediction was done with the software tools TargetP (TarP), iPSORT (iPS), and Predotar (Pred), results being indicated by the following abbreviations: M, mitochondrial; P, plastid; and n, no targeting signal predicted.

Data for putative carotenoid biosynthetic genes (see Fig. 2 for full names of gene products) were compiled, analyzed, and presented as described for the Chl biosynthetic genes in the legend of Table I. (Note that abbreviations are the same as in Table I).

Alignments of the predicted amino acid sequences from the homologous Chl and carotenoid biosynthesis genes from C. reinhardtii, the vascular plant Arabidopsis, and the cyanobacterium Synechocystis PCC 6803 were constructed and compared with respect to the lengths of the encoded proteins, their degree of conservation (expressed as percent identity/similarity), and the number of shared introns for the eukaryotic sequences. The presence of putative targeting presequences and additional conserved domains exclusively present in eukaryotic homologs were also investigated, with results of the analyses summarized in Tables I and II.

The predicted Chl and carotenoid biosynthesis genes from C. reinhardtii and Arabidopsis are consistently larger than those of Synechocystis PCC 6803, suggesting that the eukaryotic polypeptides may contain organellar-targeting presequences and/or additional domains within the mature proteins. This was further examined by aligning each of the predicted proteins from C. reinhardtii with homologous sequences from several vascular plants and cyanobacteria (alignments not shown); these alignments confirmed that the sequences from C. reinhardtii and vascular plants contain N-terminal extensions, usually between 30 and 90 amino acids, relative to the homologous cyanobacterial sequences. The sizes of the N-terminal extensions on the C. reinhardtii polypeptides are presented in Tables I and II. In some cases, an additional conserved N-terminal domain, probably part of the mature polypeptide, was present on the C. reinhardtii and Arabidopsis proteins, relative to the cyanobacterial homolog. This additional sequence probably evolved after the origin of plastids. For predicted proteins containing an additional conserved N-terminal domain that appears to be present in the mature protein, the presequence sizes specified in Tables I and II are marked with asterisks. The potential significance of these domains is discussed in more detail below.

In Tables I and II, presequence lengths, as inferred from the amino acid sequence alignments, are also compared to results from cleavage site prediction by ChloroP (Emanuelsson et al., 1999). In general, ChloroP predicts shorter presequences, and only in the case of glutamyl-tRNA synthetase (GTS), ALA dehydratase (ALAD), uroporphyrinogen III decarboxylase 2 (UROD2), 1-hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate synthase (HDS), and geranylgeranyl reductase (GGR) do the predicted cleavage sites fall within conserved regions of the aligned proteins. We are aware of only one enzyme, coproporphyrinogen III oxidase 1 (CPX1), for which the cleavage site has been determined directly by sequencing the N terminus of the mature protein (Quinn et al., 1999); in this case, the experimental data and ChloroP cleavage site prediction are congruent. However, in other cases, incorrect predictions by ChloroP are likely to occur, and only N-terminal sequencing of mature proteins will provide valid data on the lengths and cleavage sites for presequences.

We also analyzed the C. reinhardtii deduced protein sequences with the targeting prediction tools TargetP (Nielsen et al., 1997; Emanuelsson et al., 2000), ChloroP (Emanuelsson et al., 1999), Predotar (Small et al., 2004), and iPSORT (Bannai et al., 2002). Often these tools, developed primarily for use with vascular plant sequences, are not able to differentiate between C. reinhardtii mitochondrial and plastidic targeting signals since chloroplast transit peptides in this alga share features with both mitochondrial and plastid presequences of vascular plants (Franzen et al., 1990). However, in spite of the shortcomings of these programs, organellar targeting was predicted for nearly all of the proteins analyzed by at least two of the three software tools (Tables I and II). Carotenogenic ζ-carotene desaturase (ZDS; step 32; steps are marked in Figs. 1 and 2 and are listed in tables) was the sole protein for which only a single algorithm predicts its localization to an organelle.

The MEP-pathway enzyme HDS (step 26) appears to have an exceptionally short leader sequence, as deduced from both cDNA and genomic information. The open reading frame (ORF) contains an eight-amino acid sequence that precedes the first conserved motif (YCES). However, both TargetP and iPSORT predicted targeting of this polypeptide to the chloroplast, while Predotar suggested mitochondrial localization (Table II). Based on both TargetP and ChloroP, HDS has a putative organellar-targeting presequence of 22 amino acids, although the latter algorithm did not confirm that the presequence was involved in chloroplast localization. Since the conservation within the HDS polypeptide begins at amino acid nine and the transit peptide is predicted to be represented by the first 22 amino acids, it is conceivable that the HDS targeting sequence is not cleaved from the protein upon import into the chloroplast. This has recently been shown to be the case for CP29, a Chl-binding light-harvesting protein, of C. reinhardtii (Turkina et al., 2004).

Some Proteins Exhibit Significantly Less Conservation

Most proteins in Tables I and II are highly conserved among C. reinhardtii, Arabidopsis, and Synechocystis PCC 6803, sharing more than 60% pairwise amino acid identity and approximately 80% amino acid similarity. In many cases, the close phylogenetic relationship between C. reinhardtii and Arabidopsis genes is supported by the presence of one or more conserved intron positions. However, the deduced sequences for some Chl and carotenoid biosynthetic enzymes exhibit a significantly lower level of conservation. A lack of conservation is striking for the uroporphyrinogen III synthase (UROS; step 6); this protein is poorly conserved among all three of the organisms examined in this analysis.

In bacteria, UROS is the product of hemD. A number of (cyano)bacterial species, including Synechocystis PCC 6803, contain a hemD-like gene predicted to encode a hybrid protein representing a fusion of uroporphyrinogen III methyltransferase (UPM) with UROS (Panek and O'Brian, 2002). The occurrence of UROS as a domain of a fusion protein in Synechocystis PCC 6803 could in part explain the low similarity between the cyanobacterial UROS domain of the fusion protein and the UROS of C. reinhardtii. UPM catalyzes the first committed step in the biosynthesis of siroheme, which involves the methylation of the product formed by the UROS reaction (Fig. 1). Therefore, fusion of UPM with UROS in the (cyano)bacterial proteins probably has a role in regulating allocation of pathway intermediates for siroheme and Chl formation. Arabidopsis (At5g40850; Leustek et al., 1997) and C. reinhardtii (C_940006) both contain homologs of UPM; it is important to examine the expression of these proteins and their potential for interactions with UROS.

Several proteins that are part of the Chl biosynthetic pathway of C. reinhardtii are represented by multiple genes coding for putative isozymes. Some of these predicted proteins, specifically the isoforms of UROD3 (step 7), CPX2 (step 8), and a putative H-subunit of the magnesium (Mg)-chelatase (CHLH2; step 10c), appear to have diverged significantly from their counterparts in Arabidopsis and Synechocystis PCC 6803. As a first approximation, this can be explained by a relaxed pressure to conserve genes that are represented by multiple copies on the genome, allowing for the evolution of enzymes with altered function(s) or expression patterns. A more detailed analysis of the potential isozymes is presented below.

With respect to the carotenoid biosynthetic pathway, the sequence of the enzyme isopentenyl diphosphate:dimethylallyl diphosphate isomerase (IDI; step 28) is not highly conserved. The C. reinhardtii and Arabidopsis enzymes are 43% identical and 62% similar at the amino acid sequence level. Low conservation for this protein was previously noted by Cunningham and Gantt (2000); they concluded that IDI from green algae and vascular plants were likely to have separate origins. The cyanobacterial (type II) IDI is completely unrelated to the eukaryotic enzyme (Steinbacher et al., 2003). Furthermore, the late carotenogenic enzymes in C. reinhardtii and Arabidopsis, including the lycopene cyclases (LCYB, step 34; LCYE, step 35), the carotenoid hydroxylases (CHYB, step 36; CHYE, step 37), and zeaxanthin epoxidase (ZEP; step 38), are significantly less conserved than the other enzymes in the pathway. These proteins have no cyanobacterial homologs, with the exception of the lycopene cyclases in the genus Prochlorococcus.

Several deduced proteins that function in Chl and carotenoid biosynthesis in C. reinhardtii and Arabidopsis have strong similarity to each other but low levels of conservation relative to their cyanobacterial homologs. The Chl genes in this group encode the plastidic GTS (step 1), the porphobilinogen deaminase (PBGD; step 5), and the CPX (step 8). The carotenoid genes in this group encode most enzymes of the MEP pathway (steps 21–27), namely, deoxy-xylulose-5-phosphate synthase (DXS), 4-diphosphocytidyl-2-methyl-erythritol synthase (CMS), 4-diphosphocytidyl-2-methyl-erythritol kinase (CMK), 2-methyl-erythritol-2,4-cyclodiphosphate synthase (MCS), and HDS. For all of these genes, identity/similarity between Synechocystis PCC 6803 and C. reinhardtii is about 20% lower than between C. reinhardtii and Arabidopsis (Tables I and II). The low similarity between MEP enzymes of vascular plants and cyanobacteria was noted previously (Lange et al., 2000); a similar degree of difference in these proteins between C. reinhardtii and Synechocystis PCC 6803 is noted here. Lange et al. (2000) suggested the most likely explanation for this observation to be a lateral transfer of these genes from other eubacteria into cyanobacteria subsequent to the primary endosymbiosis that led to evolution of extant plastids.

The nuclear genes encoding proteins involved in Chl and heme biosynthesis in C. reinhardtii may have originated either from an ancestral chloroplast or mitochondrion. Indeed, the CPX proteins from vascular plants and C. reinhardtii are more similar to human and yeast CPX than to any of the cyanobacterial homologs. CrCPX1 has 55% (73%) and CrCPX2 has 43% (64%) identity (similarity) to CPX of human, and only 42% (57%) and 35% (55%) identity (similarity) with Synechocystis PCC 6803 CPX, respectively (see Table I and below). Similarly, the green algal and vascular plant PBGD are most closely related to the homologous enzyme from α-proteobacteria, which are considered to be among the closest known eubacterial relatives of mitochondria (Gray et al., 1999). The similarity between the plant enzyme and the mitochondrial PBGD from animals, however, is much lower, while a comparison of the latter with eubacterial sequences revealed highest similarity to the enzyme from cyanobacteria. To unravel this virtual paradox, more detailed studies will be necessary to elucidate the phylogenetic relationship of these enzymes from a larger set of taxa.

Conserved Domains in C. reinhardtii and Vascular Plant Proteins That Are Absent in the Cyanobacterial Homologs

Since both Chl and carotenoids are synthesized in plastids, N-terminal plastid targeting signals are expected to be associated with all nucleus-encoded proteins that participate in the synthesis of these pigments. The targeting signals generally display little or no conservation at the primary sequence level (von Heijne et al., 1989). Several enzymes involved in pigment biosynthesis in vascular plants were observed to have conserved domains absent from their cyanobacterial counterparts. These conserved domains are generally composed of 20 to 40 amino acids, and are mostly located at the N terminus of the protein between the targeting signal and the first common domain to the Arabidopsis/C. reinhardtii and cyanobacterial homologs. These additional sequences are part of the mature protein but are probably not essential for catalytic activity since they are absent in the cyanobacterial enzymes. Removal of the N-terminal extensions from phytoene synthase (PSY; Misawa et al., 1994), lycopene cyclases LCYB and LCYE (Hugueney et al., 1995; Cunningham et al., 1996), and the chlorophyllide a oxygenase (CAO; Nagata et al., 2004) did not result in a loss of enzymatic activity. Hence, these N-terminal extensions may have a regulatory function, either through interactions with metabolites or other polypeptides. This is supported to some extent by the finding that expression of tomato (Lycopersicon esculentum) PSY lacking the conserved N-terminal motif in Escherichia coli resulted in higher phytoene production than the expression of tomato PSY for which just the targeting sequence had been removed (Misawa et al., 1994).

In the following analyses, we focus on domains of Chl and carotenoid biosynthesis enzymes conserved between vascular plants and C. reinhardtii but not present in the bacterial homologs. Identification of domains conserved only within the green algal lineage would require genomic/cDNA sequence information from additional green algal genera. In the Chl biosynthetic pathway, the three early enzymes glutamyl-tRNA reductase (GTR), Glu-1-semialdehyde aminotransferase (GSA), and ALAD (steps 2–4), as well as the Mg-protoporphyrin IX methyltransferase PPMT (step 11), possess an N-terminal conserved domain of 15 to 20 amino acids present in both C. reinhardtii and vascular plant enzymes (Supplemental Fig. 1). While GTR, ALAD, and PPMT probably acquired this sequence after the establishment of plastids within host cells, in the case of GSA the conserved domain is also present in some of the cyanobacterial homologs (e.g. species from the genus Prochlorococcus). In other cyanobacteria, including Synechocystis PCC 6803, remnant of the sequence still appears to be present contiguous to the N terminus of the ORF in the genome, but it appears to be no longer part of the ORF (see Supplemental Fig. 1).

In the carotenoid biosynthetic pathway, the enzymes 1-deoxy-d-xylulose-5-phosphate reductoisomerase (DXR; step 22), phytoene desaturase (PDS; step 31), and ZDS (step 32) share conserved N-terminal extensions of up to 40 amino acids (Supplemental Fig. 2). Interestingly, the extensions associated with LCYB and LCYE display only a very low level of conservation between vascular plants and green algae but are well conserved within each of the two clades (data not shown). However, our alignments include sequences from only three green algae for LCYB (C. reinhardtii, Volvox carteri, and Haematococcus pluvialis) and two for LCYE (C. reinhardtii and V. carteri), with all of these sequences from the genus Volvocales (data not shown). Therefore, additional analyses of conserved domains associated with the two cyclases would benefit from a broader taxon sampling. As hypothesized by Grossman et al. (2004), the putatively conserved domains of LCYB and LCYE might interact with LHC apoproteins leading to altered enzyme activity (see also below). We suggest that the conserved domains on these polypeptides in vascular plants and algae represent targets for regulatory processes and exciting areas for future research.

CAO (step 15) from vascular plants contains a particularly large N-terminal extension. The first CAO gene sequenced was from C. reinhardtii (Tanaka et al., 1998); the identification of this sequence facilitated the subsequent identification of the homologous gene from Arabidopsis (Espineda et al., 1999; Rüdiger et al., 1999). Partial sequences of CAO genes from the prochlorophytes Prochloron didemnii and Prochlorothrix hollandica were also obtained (Tomitani et al., 1999). Biochemical studies on Arabidopsis CAO revealed that, at least in vitro, only chlorophyllide a can be used as a substrate for catalyzing the formation of chlorophyllide b (Oster et al., 2000). A comparison of full-length CAO sequences from various organisms (Nagata et al., 2004) has demonstrated that the mature Arabidopsis and Oryza sativa enzymes contain an N-terminal extension with a highly conserved A-domain of approximately 130 amino acids and a less conserved B-domain of 30 amino acids; both domains are absent from the prochlorophyte and C. reinhardtii CAO. Although this N-terminal extension was shown to be dispensable for the catalytic activity of the Arabidopsis enzyme, it was hypothesized to play a role in regulation of enzyme activity (Nagata et al., 2004). An interesting observation made by Vermaas and coworkers (Xu et al., 2001) was that the activity of CAO from Arabidopsis expressed in Synechocystis PCC 6803 could be strongly enhanced by coexpressing it with an apoprotein of the LHCII from pea (Pisum sativum). Although no stably assembled LHCII was detectable in the cyanobacterial transformants and the newly formed Chl b accumulated mainly in the core complexes of both photosystems, the results suggest that an interaction between CAO and apo-LHCII may modulate the activity of the enzyme (Xu et al., 2001). Nagata et al. (2004) have suggested that the A-domain of CAO is critical for this interaction to occur.

In the C. reinhardtii genome, a number of small EST sequences are located upstream and in close proximity to the CAO gene. A model predicted for CAO by GreenGenie (genie.8.14) suggests that the C. reinhardtii ORF, originally predicted by Tanaka et al. (1998), might extend into these small EST sequences, generating a coding region with an additional 600 bp at the 5′ end of the gene and containing one additional intron. Another gene prediction tool, GENSCAN (http://genes.mit.edu/GENSCAN.html), also suggests the presence of an alternative start codon, extending the CAO N-terminal by 182 amino acids. We confirmed these predictions by sequencing a long C. reinhardtii cDNA clone (AV626430) and used the deduced sequence to search sequence reads from the V. carteri whole-genome shotgun (WGS) library. We were able to assemble the complete V. carteri CAO coding region from these reads. Figure 3 shows an alignment of the deduced CAO proteins from C. reinhardtii, V. carteri, Dunaliella salina, Arabidopsis, O. sativa, and P. hollandica; the D. salina sequence is likely to be incomplete. The N-terminal regions of CAO from green algae and vascular plants exhibit significant similarities. In addition, the secondary structures of the putative N-terminal extensions of CAO predict an extended α-helix that aligns between the algal and vascular plant proteins. Finally, examination of the presequence for a putative plastid targeting sequence also corroborates the presence of an N-terminal extension on the C. reinhardtii CAO. The original C. reinhardtii (BAA33964) and D. salina (BAA82481) CAO sequences deposited in GenBank were not predicted by TargetP, iPSORT, or Predotar to have a presequence that routes the protein to the plastid, while such a presequence was predicted by all three of these algorithms for the extended forms of CAO from C. reinhardtii and V. carteri. However, recent biochemical evidence from immunological analyses of the putative CrCAO polypeptide using an AtCAO antibody suggests that mature CAO from C. reinhardtii has an approximate molecular mass of 51 kD (Eggink et al., 2004). This would correspond to the theoretical mass of 51.4 kD calculated from the CAO sequence (BAA33964) determined by Tanaka et al. (1998). The predicted molecular mass of CrCAO with the conserved extension, after removal of the predicted 29-amino acid presequence (predicted by ChloroP), would result in a 69-kD mature protein. Therefore, it is critical to sequence the N-terminal part of the mature, chloroplast-localized protein.

If the additional conserved domain at the N terminus of the C. reinhardtii CAO is confirmed to be part of the mature polypeptide, it will be important to establish whether or not it interacts with the C. reinhardtii LHC polypeptides and the specificity of these interactions, if they occur. It is possible that the low degree of conservation between extensions of the green algal and vascular plant CAO can be explained by corresponding differences in the green algal and vascular plant LHC polypeptides (Elrad and Grossman, 2004), and a need for the two proteins to coevolve.

HDS (step 26), which catalyzes the penultimate step in formation of active isoprene by the MEP pathway, is another potential target for additional research that would help elucidate regulatory processes involved in pigment biosynthesis. The C. reinhardtii HDS gene model (Table II) predicts the presence of an extended insertion of about 260 amino acids with significant sequence similarity to the analogous domain from HDS of Arabidopsis (Querol et al., 2002). This domain is located in the central part of the protein. While this domain is absent from bacterial HDS, the Arabidopsis HDS can complement a HDS-null mutant of E. coli (Querol et al., 2002). The presence of this additional domain in the C. reinhardtii HDS was confirmed by sequencing a putative HDS cDNA (AV626792). An alignment of the HDS sequences from C. reinhardtii, Arabidopsis, and Synechocystis PCC 6803 is available as Supplemental Figure 3. The significance of the additional domain(s) in the plant and algal enzyme is not known, but it is interesting to note that the amino acid identity of this domain from C. reinhardtii and Arabidopsis is only 49%, while the remainders of the proteins are 74% identical (data not shown). It is reasonable to speculate that the inserted eukaryotic domain of HDS may be involved in regulation and that the details of this regulation may be somewhat different in C. reinhardtii and Arabidopsis.

Two Carotenogenic Genes of Vascular Plants Have No Identified Homologs in C. reinhardtii, while the Presence of Another Carotenogenic Gene on the C. reinhardtii Genome Was Unexpected

For most genes known to be directly involved in the biosynthesis of Chl and carotenoids in vascular plants, we were able to identify homologs in the current version of the C. reinhardtii genome. However, we were unable to identify C. reinhardtii genes encoding the plant enzymes violaxanthin deepoxidase (VDE; step 39) and neoxanthin synthase (NSY; step 40). Since the current version of the C. reinhardtii genome is only about 90% complete, the missing genes might still be discovered in the fractions of the genome that have not yet been sequenced. However, we wouldn't regard this as very likely for reasons explained below.

NSY has only been identified in two species of the family Solanaceae, potato (Solanum tuberosum) and tomato (Al-Babili et al., 2000; Bouvier et al., 2000). In the complete genome of Arabidopsis, no gene homologous to NSY was detected (Hirschberg, 2001). Interestingly, NSY from tomato and potato turned out to be paralogous to the two lycopene cyclases (LCYB and LCYE), common to all vascular plants, and the closely related capsanthin-capsorubin synthase (CCS) from bellpepper (Capsicum annuum). Furthermore, both NSY (Ronen et al., 2000) and CCS (Hugueney et al., 1995) were shown to possess lycopene-cyclase activity. Therefore, it is conceivable that in plants lacking a separate NSY, one of the two lycopene cyclases (most likely LCYB based on its similarity to NSY) might be responsible for the formation of neoxanthin from violaxanthin, possibly triggered by interactions with the neoxanthin-binding proteins of LHCII (Grossman et al., 2004). Alternatively, an enzyme unrelated to NSY of the Solanaceae might be responsible for neoxanthin formation in Arabidopsis and C. reinhardtii.

VDE catalyzes the deepoxidation of violaxanthin as part of the photoprotective xanthophyll cycle (Yamamoto et al., 1999). Genes encoding VDE have been sequenced from several vascular plants (Bugos et al., 1998). Neither the current version of the C. reinhardtii genome nor WGS reads available from the closely related alga V. carteri contain any sequences with significant similarity to VDE from vascular plants. These results suggest that green algae may use a deepoxidating enzyme with characteristics different from those of the plant enzyme. This suggestion is supported by the observation that dithiothreitol, a potent inhibitor of vascular plant VDE, does not prevent violaxanthin deepoxidation in high light-exposed cultures of C. reinhardtii (K. Niyogi, personal communication). Identification of the deepoxidase from C. reinhardtii by map-based cloning is currently in progress (Anwaruzzaman et al., 2004).

We surprisingly detected a gene coding for a putative β-carotene ketolase (BKT; step 42), based on similarity to BKT from the green alga H. pluvialis. BKT introduces a keto-group at C(4) of β-ionon rings and, in conjunction with β-carotene hydroxylase (CHYB), catalyzes the formation of the ketocarotenoid astaxanthin (Lotan and Hirschberg, 1995; Breitenbach et al., 1996). Interestingly, the genes coding for CHYB and BKT are contiguous on the C. reinhardtii genome, with BKT located on the same strand and just upstream of CHYB. To the best of our knowledge, astaxanthin has not been detected in C. reinhardtii, and our attempts to detect it (using HPLC) in both nutrient-replete and nutrient-limited cultures have been unsuccessful (data not shown). The putative BKT gene in C. reinhardtii appears to be expressed since it is represented by a cDNA clone (1024014H04) in the EST database. Since the BKT gene in the current version of the C. reinhardtii genome database contains two large gaps, we sequenced the corresponding cDNA. As the alignment in Figure 4 demonstrates, the central part of the C. reinhardtii and H. pluvialis homologs is highly conserved at the amino acid level (70% identity and 82% similarity). However, BKT from C. reinhardtii is predicted to have a C-terminal extension of about 115 amino acids, which is absent from any ketolase previously characterized. It will be interesting to examine the functional significance of this amino acid extension, which might relate to the absence of astaxanthin in C. reinhardtii.

Protein alignment of BKT from C. reinhardtii with the homologous enzymes from the green alga H. pluvialis and the ketolases (CrtO) from cyanobacteria and proteobacteria. At conserved sites, black boxes indicate identical amino acids, while gray boxes denote similar amino acids (conservative exchanges according to PAM250 matrix). Putative α-helical transmembrane domains predicted by TMHMM in the BKT from C. reinhardtii are indicated by bars above the sequence (see also Cunningham and Gantt, 1998). Note the large C-terminal extension of the protein from C. reinhardtii not present in the other proteins. The cDNA sequence from C. reinhardtii coding for BKT has been deposited in GenBank under the accession number AY860820. Other species and sequence accessions are as follows: H. pluvialis 1, CAA60478, and 2, BAA08300; Nostoc PCC 7120, BAB74888; Synechococcus WH 8102, ZP_00115639; Brevundimonas aurantiaca, AAN86030; and Paracoccus MBIC1143, BAA09591.

C. reinhardtii and Arabidopsis Differ Significantly in the Number of Putative Isozymes Involved in the Biosynthesis of Chl and Carotenoids

In Arabidopsis, there are often multiple genes coding for putative isozymes that function at a number of different steps in the pathway for Chl synthesis (Lange and Ghassemian, 2003). By contrast, most reactions in the analogous pathway in C. reinhardtii are catalyzed by unique gene products, with the exception of UROD, CPX, CHLI, CHLH, and CHL27 (Table I). The reactions of the carotenogenic pathway in C. reinhardtii are all catalyzed by unique gene products (with the caveat that there are still some gaps in the genome sequence). In Arabidopsis, there do appear to be isogenes for DXS, IDI, geranylgeranyl diphosphate synthase (GGPS; step 29), and CHYB (Table II).

The increased number of isozymes associated with pigment biosynthesis in vascular plants relative to C. reinhardtii or cyanobacteria may be related to increased regulatory demands and perhaps also to different local environments (e.g. in cells of different tissue types). As an example for the Chl biosynthesis pathway, the expression of GTR1 (HEMA1) in Arabidopsis was highest in green tissue and under stringent light control, while GTR2 was mainly expressed in roots and flowers in a light-independent manner (McCormac et al., 2001; Ujwal et al., 2002). Organ-specific expression of two GTR isogenes from barley (Hordeum vulgare; Bougri and Grimm, 1996) has also been reported. With respect to the carotenoid pathway, DXS was shown to be encoded by two different genes in the legume Medicago truncatula. DXS1 was expressed in a variety of developing tissues, with the exception of the roots. DXS2 expression was strongly stimulated in roots upon colonization with mycorrhizal fungi (Walter et al., 2002). There are both organ- and organellar-specific isoforms of GGPS in Arabidopsis (Zhu et al., 1997; Okada et al., 2000). There appear to be 12 different GGPS isogenes in Arabidopsis (Lange and Ghassemian, 2003), although expression of only five has been demonstrated (Okada et al., 2000). There is one GGPS and three other ORFs on the C. reinhardtii genome that encode related, prenyl transferase-like proteins. Alignment of these sequences with GGPS and related prenyl transferases from Arabidopsis and other vascular plants (data not shown) has tentatively enabled us to assign the products of the other genes the functions geranyl pyrophosphate synthase (gene model C_490103), farnesyl pyrophosphate synthase (C_120115), and solanesyl pyrophosphate synthase (C_1690011).

Putative Isozymes That Function in Chl Biosynthesis in C. reinhardtii and Their Relationship to Enzymes from Other Photosynthetic Organisms

The C. reinhardtii isogenes involved in Chl biosynthesis are UROD (step 7), CPX (step 8), two subunit genes of the Mg-chelatase (step 10), CHLI and CHLH, and the recently identified CHL27 (step 12). The CHL27 protein appears to be involved in catalyzing the formation of the cyclopentanone ring of Chl (Moseley et al., 2000; Tottey et al., 2003). As a first step toward a detailed characterization of potential isozymes associated with pigment biosynthesis in C. reinhardtii, we searched for homologs in the genomes of Arabidopsis and O. sativa, in vascular plant EST databases, in the genomes of the red alga Cyanidioschyzon merolae, the diatom Thalassiosira pseudonana, as well as in the current cyanobacterial databases.

In C. reinhardtii, UROD is the first enzyme in the Chl biosynthetic pathway encoded by multiple genes (step 7), three in this case. A comparison among the predicted UROD proteins of C. reinhardtii is presented in Supplemental Figure 4. The encoded proteins have 43% to 55% identity and 67% to 76% similarity among themselves, and expression of all three of the UROD isogenes is supported by EST sequence data. Vascular plants in general appear to contain at least two different UROD genes; two isogenes were identified in Arabidopsis, potato, tobacco (Nicotiana tabacum), and barley. For O. sativa and Zea mays, cDNA data suggest the occurrence of three isogenes (but see below). The genome of C. merolae also contains two UROD isogenes, while the T. pseudonana genome harbors three isogenes. The cyanobacterial genomes (eight complete and five partial) each contain a single UROD gene.

The phylogenetic relationship among UROD isoforms from different organisms is depicted in Figure 5. The cyanobacterial enzymes cluster at the base of the neighbor-joining tree, while the eukaryotic enzymes fall into three groups, with the CrUROD1 from C. reinhardtii located at the base of a cluster also containing UROD1 from vascular plants. Similarly, CrUROD2 from C. reinhardtii and vascular plants form a second cluster. Both clusters have high bootstrap support. The third UROD cluster is divided into two subclusters containing the red algal and diatom isozymes, and while the phylogenetic position of CrUROD3 from C. reinhardtii is less well resolved, it appears to fall into a subcluster with one each of the red algal and the diatom isoforms. The other UROD from C. merolae and the two remaining isoenzymes from T. pseudonana (TpUROD3 was assembled from unplaced WGS reads) comprise the other subcluster. A very similar branching pattern, with similar bootstrap values, resulted from a maximum-likelihood analysis employing 100 bootstrap replicates (data not shown).

Surprisingly, OsUROD3 (from O. sativa) is most closely related to CrUROD2. However, the OsUROD3 sequence is supported by a single cDNA entry (AK110601), and we failed to retrieve any additional EST or genomic sequence data for this putative gene from the O. sativa databases. Hence, the single cDNA may represent a contamination of the cDNA library with an unidentified green alga. This is corroborated by comparative analyses of GC content and codon usage of the O. sativa UROD genes. While OsUROD1 and OsUROD2 have a GC content of 50% and 54% and an effective number of codons (ENC) used of 55.6 and 57.9, respectively, the OsUROD3 sequence has a strong bias both with respect to GC (64%) and ENC (36.2) values, which are more similar to the values expected for ORFs of green algae like C. reinhardtii or V. carteri (see Table III and below).

Comparison of EST data and ENC values for ORFs in the C. reinhardtii genome encoding putative proteins involved in biosynthesis of chlorophylls (1–16) or carotenoids (21–43)

C. reinhardtii EST clones deposited in GenBank (http://www.ncbi.nlm.nih.gov) were grouped into three categories: (1) unstressed (clones of projects 874, 894, and 1,024 from Shrager et al., 2003; clones whose accession begins with “AV” from Asamizu et al., 1999, 2000); (2) stressed (projects 963, 1,031, and 3,510 [=1,115] from Shrager et al., 2003; projects 832 and 833); (3) other (projects 1,030, 3,511 (=1,112), and 925 from Shrager et al., 2003; clones whose accession begins with “BP” from Asamizu et al., 2004). EST clones with more than one sequence entry in GenBank (5′- and 3′-reads) were counted only once. The detailed frequencies of EST clones in each project are available as Supplemental Tables I and II. The ENC used by each ORF was calculated according to Wright (1990). For UROD, CPX, CHLI, and CHLH, the respective isogene that is represented by the majority of EST clones in cDNA libraries generated from unstressed cells and at the same time has the lowest ENC value is in bold.

The UROD reaction is positioned at a branch point of tetrapyrrole biosynthesis, competing with UMP for the substrate uroporphyrinogen III. A comparison of intron positions among the C. reinhardtii and Arabidopsis isogenes (Table I; Supplemental Fig. 4) reveals that some intron positions are conserved between the different UROD isogenes within a given organisms. These findings suggest that a gene duplication occurred after the endosymbiotic event that presaged the evolution of the chloroplast in eukaryotic plant cells. Furthermore, since all plant and algal species that we examined contain at least two different genes encoding putative UROD isozymes, it is possible that the UROD isozymes fulfill different roles in the cell; they may no longer be functionally equivalent. Therefore, it will be useful to characterize the expression characteristics and localization of the putative UROD isozymes.

The product of the reaction catalyzed by UROD, coproporphyrinogen III, is oxidized by CPX (step 8), which is encoded by two different genes in C. reinhardtii. EST sequences are available for both CPX genes (Tables I and III). The full-length sequence of CPX1 was previously reported, and its gene product was purified and shown to be localized in the plastid (Hill and Merchant, 1995; Quinn et al., 1999). The deduced CPX protein sequences are compared in Supplemental Figure 5. The genomes of Arabidopsis and T. pseudonana also contain two potential CPX genes, while the C. merolae genome has a single CPX gene. The two CPX isogenes from Arabidopsis are very similar at the nucleotide level, which probably reflects a recent gene duplication. However, CPX2 from Arabidopsis does not appear to encode a functional product since the ORF contains a frame shift (Santana et al., 2002). A single copy of the CPX gene was detected in other vascular plant genomes.

As noted earlier, CPX from vascular plants and C. reinhardtii is most similar to the mitochondrial enzyme from animals and fungi. The two CPX genes from C. reinhardtii have no intron positions in common, and the deduced amino acid sequences of the isozymes differ significantly (Supplemental Fig. 5); the amino acid sequences of the two CPX proteins from T. pseudonana are also very different. The CPX1 isozymes of the two algae cluster with the single CPX proteins from vascular plants and C. merolae and the CPX homolog from the prasinophyte Ostreococcus tauri. By contrast, C. reinhardtii and T. pseudonana CPX2 isoforms group in a separate cluster, positioned between the cyanobacterial and animal/fungal CPX clusters (Fig. 6). The same branching pattern could be reproduced with high bootstrap support (n = 100) by a maximum-likelihood analysis of the data set (data not shown). As there is a close relationship between algal CPX2 and mitochondrial CPX, it will be important to establish the subcellular location(s) of CPX2 in C. reinhardtii. Cyanobacteria of the genus Nostoc also have two CPX genes. However, these isoforms cluster within the cyanobacterial branch of the tree, suggesting that they are the result of a recent, local gene duplication that is restricted to a subgroup of the cyanobacteria.

Interestingly, vascular plants generally seem to contain two protoporphyrinogen IX oxidase (PPX) isozymes (step 9), which catalyze the step in Chl and heme biosynthesis immediately following CPX. In tobacco, one of these isozymes has been shown to be plastid specific, while the other was localized to mitochondria (Lermontova et al., 1997). The three algal genomes that we examined each contain a single PPX gene, encoding a protein that is most similar to the plastid-specific PPX from tobacco and the PPX1 (At4g01690) from Arabidopsis (Table I).

Mg-chelatase (step 10) is situated at another important branch point in the tetrapyrrole biosynthetic pathway, catalyzing the committed step leading to Chl formation. This reaction has been recognized as an important target for regulation and has been the focus of several studies (e.g. see Walker and Willows, 1997; Beale, 1999). The active Mg-chelatase complex is a multimer composed of three different subunits termed CHLD, CHLH, and CHLI in eukaryotes and cyanobacteria, and bchD, bchH, and bchI in photosynthetic bacteria (they are required for bacteriochlorophyll synthesis). All subunits of the complex are highly conserved among photosynthetic organisms, with CHLI showing the highest sequence identity (Beale, 1999). In addition, the N-terminal half of CHLD exhibits significant similarity to the smaller CHLI subunit.

The C. reinhardtii genome contains two copies each of the CHLI and CHLH genes (Table I). Both CHLI isogenes are expressed since there are several EST sequences in the C. reinhardtii database for each (Tables I and III). In addition, there is a full-length cDNA sequence for CHLI1 (Lake and Willows, 2003). The core domains of CHLI1 and CHLI2 from C. reinhardtii show 62% identity and 82% similarity. CHLI2 has a C-terminal extension of approximately 40 amino acids that is absent from the CHLI enzymes from other organisms; it may modify the function of this chelatase subunit (see Supplemental Fig. 6 and discussion below). Using the CHLI1 and CHLI2 sequences from C. reinhardtii, we were able to assemble the respective genes from the WGS reads for V. carteri, and included the deduced protein sequences in our analyses.

In four other green algae, Chlorella vulgaris (Trebouxiophyceae), Mesostigma viride, Nephroselmis olivacea (both Prasinophyceae), and Chaetosphaeridium globosum (Charophyceae), unique genes encoding CHLI are located on the chloroplast genome; the plastome of C. reinhardtii does not harbor a CHLI gene. Moreover, the CHLI gene of four red algae, a euglenophyte, a cryptophyte, a diatom, and a raphidophyte, is also encoded by the plastid genome.

Among the available sequences from vascular plants, we only identified two CHLI isoforms for Arabidopsis. These isozymes are 88% identical and 97% similar in the core region of the protein, suggesting that the two genes are the consequence of a recent duplication.

In phylogenetic analyses of CHLI from algae, vascular plants and cyanobacteria applying neighbor-joining (Fig. 7A) or maximum-likelihood methods (data not shown), CHLI1 of C. reinhardtii grouped with the CHLI proteins from other Chl b-containing organisms, i.e. the other green algae, Euglena gracilis, and vascular plants. Interestingly, CHLI2 from C. reinhardtii and V. carteri were well separated from plant and cyanobacterial CHLI proteins, making them somewhat unusual. As mentioned above, major regions of the CHLI proteins are highly conserved; therefore, only a limited number of informative amino acid positions are available for generating a phylogeny. We neglected to correct our analyses for substitution-rate heterogeneity, which explains the exceptionally short branch lengths of the neighbor-joining tree presented in Figure 7A. However, while the tree does not reflect true evolutionary distances, the branching pattern should not be affected.

Since C. reinhardtii and V. carteri were the only organisms for which two putative isoforms of CHLI were identified and the only algae in which the CHLI genes were located on the nuclear genome, the isogenes may have originated from a recent gene duplication, possibly at the base of the order Volvocales. This is supported by the observation that the two CHLI genes from C. reinhardtii share an intron position but have no intron sites in common with vascular plants CHLI genes. In opposition to this hypothesis, the remote position of CHLI2 in the phylogenetic groupings suggests that it may not be the result of a recent gene duplication. However, a gene duplication following the transfer of plastome-encoded CHLI to the nucleus may have relaxed the selective pressure on the isogenes (as a consequence of more than one gene copy), allowing for rapid divergence. As a consequence, the function of CHLI2 may be significantly different from that of CHLI1. This possibility is congruent with the finding that a number of highly conserved amino acids in all CHLI enzymes from a variety of distantly related photosynthetic organisms (i.e. all other organisms in the tree; Fig. 7B) are not conserved in the CHLI2 protein.

In summary, the CHLI isozymes from C. reinhardtii and V. carteri appear exceptional in two ways: (1) They are the only algal CHLI proteins known so far that are nucleus encoded, and (2) the second CHLI isoform present in these algae, CHLI2, has diverged to the extent that it may have significant differences in its activities relative to the highly conserved CHLI1.

In the case of the two potential CHLH proteins encoded on the C. reinhardtii genome, the sequence of CHLH1 is supported by considerable EST data, and a full-length sequence of the CHLH1 gene has been reported (Chekounova et al., 2001). The CHLH1 gene encodes a predicted protein of 1,399 amino acids with over 65% identity to CHLH from Arabidopsis and Synechocystis PCC 6803 (Table I). A mutant of C. reinhardtii lacking CHLH1 is Chl deficient (Chekounova et al., 2001). For CHLH2, neither cDNA data nor a full-length genomic sequence is available. Database searches have revealed that the genomes of C. merolae, T. pseudonana, and several (but not all) cyanobacteria contain two distinct ORFs encoding potential CHLH isozymes that can be clustered into two groups (Fig. 8). By contrast, the vascular plants Arabidopsis and O. sativa contain a single CHLH gene. From alignments of the available CHLH protein sequences, we concluded that the C. reinhardtii CHLH2 gene model probably only includes the C-terminal half of the protein. However, we recovered genome fragments among the unplaced genomic reads from the C. reinhardtii genome database containing additional putative exons from CHLH2; the predicted polypeptide sequence encoded by these exons and the assembled C-terminal sequence of CHLH2 are aligned with CHLH1 in Supplemental Figure 7. (cyano)Bacteria have a protein designated CobN, which has significant sequence similarity to CHLH. This protein is a subunit of cobaltochelatase, an enzyme involved in cyanocobalamin synthesis. Cyanocobalamin does not appear to be synthesized by vascular plants and algae, and no enzymes in these organisms appear to rely on a cobalamin cofactor (Martens et al., 2002; Ravanel et al., 2004). The distribution pattern of CHLH2 among taxa suggests that the CHLH2 gene probably was lost during evolution of vascular plants; it might be the more ancient isoform of the Mg-chelatase H-subunit, somewhat more closely related to CobN than is CHLH1. This, however, needs to be substantiated by more detailed analyses with broader taxon sampling.

Although a function for CHLH2 has not been definitely established, it is likely to have a physiological function since it has been preserved in cyanobacteria, red algae, and diatoms. The protein may be expressed under specific conditions that require some modification of the Mg-chelatase activity/properties. If CHLH2 still functions in the association of Mg2+ with protoporphyrin IX, then known chlH1 mutants of C. reinhardtii should be rescued by the introduction of CHLH2 expressed from a functional promoter. Such a system could be used to study function and regulation of CHLH2.

Finally, there are two isogenes coding for the Mg-protoporphyrin-IX monomethylester cyclase (CHL27; step 12) of C. reinhardtii, CHL27A and CHL27B, which have been studied extensively (Moseley et al., 2000, 2002; the latter reference contains an alignment of the two isozymes). The Synechocystis PCC 6803 genome also contains two ORFs (sll1214 and sll1874) with strong similarity to CHL27 from C. reinhardtii and vascular plants (Table I). Nostoc PCC7120 contains three genes encoding putative CHL27 homologs. The cyanobacterial isogenes are most similar to each other (Supplemental Fig. 8) and probably arose as the result of recent local gene duplications. A similar situation is observed with respect to the CHL27A and CHL27B of C. reinhardtii. For the vascular plants Arabidopsis, tomato, and Z. mays, we were only able to retrieve, from both genomic and EST databases, single sequences encoding CHL27. For O. sativa, there are EST sequences that suggest the occurrence of two different isozymes. However, while one of the isoforms groups with the vascular plant enzyme, the other is most closely related to the two proteins from C. reinhardtii and may represent contamination of the cDNA library. In red algae, CHL27 homologs are generally encoded on the plastid genome, as deduced from the completely sequenced plastomes of C. merolae, Porphyra purpurea, Cyanidium caldarium, and Gracilaria tenuistipitata. We could not detect sequences on the T. pseudonana genome encoding CHL27. Furthermore, there are no proteins with similarity to CHL27 encoded on the plastome of the diatom Odontella sinensis. These findings raise the possibility that the enzyme(s) catalyzing the formation of the cyclopentanone ring of the Chl molecule in diatoms is unrelated to CHL27.

Differential Expression of Genes Involved in Chl and Carotenoid Biosynthesis as Evaluated from EST Data and the ENC Used by the Genes

We wanted to learn more about expression levels of genes involved in the biosynthesis of Chl and carotenoids in C. reinhardtii. As a first step/approximation toward this goal, we analyzed the distribution of clones in the C. reinhardtii EST database for each enzyme of the pathway and calculated the ENC (Wright, 1990) in each of the ORFs. The results are compiled in Table III. Although most of the cDNA libraries from which sequences were generated by random sequencing had been normalized, highly expressed genes still appear to be represented by greater clone frequency than genes with low expression levels; this is suggested by the large numbers of EST sequences for nucleus-encoded proteins involved in photosynthesis (e.g. the small subunits of ribulose-bisphosphate carboxylase, subunits of the two photosystems, apoproteins of the LHCs). Therefore, there is still some validity in using EST frequency and distributions as a first approximation of potential transcript levels under the conditions used to grow the cells for library construction.

The ENC used by an ORF to encode the 20 different amino acids from which a protein is synthesized can theoretically vary between 20 (a single codon used for each amino acid) and 61 (all synonymous codons used for each amino acid; stop codons excluded; Wright, 1990). The evolution of codon bias within ORFs may be most consistently linked to the rate of translation of specific transcripts and the need to optimize protein production under specific environmental/developmental conditions. An ORF for an abundant protein may comprise a highly biased string of codons that pair with the most abundantly represented set of tRNAs in the cell (Ikemura, 1985). As a consequence, a low ENC value is mainly selected for at the level of translation. We are aware that other factors might contribute to codon bias to some extent, such as the effect of specific codons on RNA secondary structure and genomic positional effects that might reflect the occurrence of chromosomal isochores. However, a detailed assessment of such effects will only be feasible after assembly of all of the linkage groups comprising the C. reinhardtii genome.

The C. reinhardtii genome has a high GC bias and an average ENC value, based on 663 C. reinhardtii ORFs registered in the codon usage database (http://www.kazusa.or.jp/codon/), of 32.5. In some highly expressed genes, however, the ENC value is close to its lower limit of 20. For example, the RBCS1 gene, which encodes a small-subunit isozyme of ribulose-bisphospate carboxylase, and the gene encoding glyceraldehyde-3-phosphate dehydrogenase have ENC values of 22.7 and 24.1, respectively.

Genes encoding putative isozymes of the Chl biosynthetic pathway of C. reinhardtii exhibit significant differences with respect to both EST frequencies and ENC values (Table III). In the cases of the UROD, CPX, CHLI, and CHLH isogenes of C. reinhardtii, those that are represented by the majority of EST clones in cDNA libraries generated from cells grown under favorable conditions (unstressed) also have especially low ENC values. The protein products of such isogenes (UROD1, CPX1, CHLI1, CHLH1; highlighted by bold letters in Table III) are probably more abundant than those of the isogenes with lower EST representation and higher ENC values, making it likely they represent the major isozymes involved in Chl and heme biosynthesis. For CPX1 (Quinn et al., 1999) and CHLH1 (Chekounova et al., 2001), this is congruent with previously reported data.

Other genes involved in Chl biosynthesis with remarkably low ENC values are GSA, CHL27B, and LPOR, while among carotenogenic genes only HDS has an ENC value of below 26. To further substantiate that the low ENC values for these genes are related to translational selection, we analyzed the codon bias for each of these genes. The most highly selective codon usage was for Pro (CCC), Thr (ACC), Gly (GGC), Leu (CTG), and Arg (CGC). This finding agrees with the analyses of other highly expressed genes in C. reinhardtii, as reported by Naya et al. (2001).

On average, the ENC values of genes of the Chl biosynthetic pathway are considerably lower than those of carotenogenic genes (median of 28.0 compared to 32.4 when considering only the lowest ENC in the case of isogenes, and counting GGR as a Chl biosynthetic gene; P < 0.01; Mann-Whitney U test). This finding might reflect a more urgent need to regulate expression of genes involved in Chl biosynthesis since the intermediates of this pathway can be extremely toxic.

Some of the specific enzymes involved in Chl biosynthesis are encoded by either two or three distinct genes. We used quantitative PCR (qPCR) to compare expression of the different members of these multigene families in cells maintained in the dark, with their expression following exposure of the cells to light (Fig. 9A). Furthermore, we included other potential regulatory targets from both the Chl and the carotenoid biosynthetic pathway in our analysis. Very low fluence white light (VLFL; 0.01 μmol photon m−2 s−1) was used for these experiments to eliminate the influence of changes in the redox state of the cell as a consequence of photosynthetic electron transport. While the transcripts for a number of the genes increased following exposure to VLFL, transcript level were generally higher after 2 h than after 4 h of VLFL treatment.

Bar graphs showing changes in transcript levels for genes involved in Chl (A) and carotenoid (B) biosynthesis in C. reinhardtii strain CC124 following exposure to VLFL, as analyzed by qPCR. The responses, shown as the change (n-fold) relative to RNA from dark grown cells, at the different times following exposure to VLFL are given as different bar fill patterns, as indicated on the graph. The graph shows relative expression levels of each gene normalized to the CBLP gene (the level of this transcript remains constant over the course of the experiment). The results show the mean and sd for triplicate qPCR results.

As shown in Figure 9A, the transcripts for both GSA and ALAD markedly increase (approximately 15-fold for GSA and 7-fold for ALAD) after a 2 h exposure to VLFL, supporting the previous findings that indicated that these genes were under light control (Matters and Beale, 1994). Since the light intensity used in these experiments was too low to affect photosynthetic activity, it is likely that a specific photoreceptor(s) is involved in the light-dependent increase in transcript levels. There was a 4- to 6-fold increase in UROD1 transcript levels after exposure of C. reinhardtii to VLFL for 2 h. However, transcripts encoding UROD2 and UROD3 showed no significant increase following VLFL exposure. This observation further supports the preliminary conclusion drawn from the EST and ENC analyses that UROD1 is the major isozyme involved in the biosynthesis of Chl and heme. The amount of UROD mRNA and protein were previously shown to increase following illumination of barley seedlings (Mock et al., 1995).

CPX is encoded by CPX1 and CPX2 isogenes in C. reinhardtii. Expression of CPX1 has been shown to be influenced by both copper (Cu) and oxygen levels (Quinn et al., 1999; Moseley et al., 2000). There are three different-sized mRNAs transcribed from the CPX1 gene; the shortest becomes very abundant in Cu-deprived cells, while levels of the other forms of the transcript are insensitive to Cu availability (Quinn et al., 1999). EST/ENC analyses suggest that CPX1 encodes the more abundant isoform in C. reinhardtii. However, based on qPCR, the expression levels of both CPX1 and CPX2 were very low, with little change following exposure of dark-grown cells to VLFL. In barley, CPX mRNA levels did not significantly differ in etiolated and greening barley leaves, but did change as the seedling developed (Kruse et al., 1995). Therefore, while the absorption of light by specific photoreceptors may not play a big role in controlling CPX expression, these genes appear to be controlled by other signals, including Cu availability, local oxygen levels, and perhaps also the redox state of the cell.

In the case of the subunits of the Mg-chelatase complex, the CHLH1 transcript of C. reinhardtii increases in VLFL (approximately 5-fold), while the level of the CHLH2 transcript is very low in both the dark and VLFL (no significant difference in the transcript level in dark and VLFL). Mutant lines of O. sativa in which the CHLH gene was disrupted are chlorotic (Jung et al., 2003), as are tobacco antisense lines (Papenbrock et al., 2000). It was previously shown that the level of the CHLH1 transcript of C. reinhardtii is influenced by light and that a chlH1 mutant in this alga leads to a chlorotic, high light-sensitive phenotype (Chekounova et al., 2001). Interestingly, CHLH antisense transgenic tobacco plants do not accumulate protoporphyrin IX and exhibit reduced levels of mRNA encoding a number of enzymes involved in the early steps of Chl biosynthesis (Papenbrock et al., 2000). These results suggest a mechanism in which metabolic intermediates in Chl biosynthesis can modulate expression of genes encoding enzymes that function in earlier steps in the pathway. By contrast, a chlH1 mutant of C. reinhardtii does not exhibit decreased levels of mRNAs encoding at least some enzymes that function early in Chl biosynthesis, although it does have a chlorotic phenotype. The level of ALAD mRNA remained high, and notably the levels of the transcripts for the CHLI and D subunits were comparable to those of wild-type cells (Chekounova et al., 2001). Hence, there are aspects of control of Chl biosynthesis in C. reinhardtii that appear to be different from those of vascular plants. However, in spite of the finding that a chlH1 mutant of C. reinhardtii was chlorotic (Chekounova et al., 2001), the expression of CHLH2 may be required for tuning the level of Chl synthesis under specific environmental conditions. Furthermore, transcripts encoded by both CHLI genes of C. reinhardtii were elevated following exposure to VLFL, although the increase in CHLI1 mRNA was higher than that of CHLI2. Light induction of CHLI has been previously observed in soybean (Nakayama et al., 1995).

Carotenoid biosynthesis has also been shown to be regulated by light in both Arabidopsis and C. reinhardtii (Bohne and Linden, 2002; Woitsch and Romer, 2003; Botella-Pavia et al., 2004; Liu et al., 2004). However, in many cases it is still not clear whether the light requirement reflects contributions to regulation from photoreceptors, redox conditions of the cell, or both. As shown in Figure 9B, we used qPCR to examine expression of many genes involved in the synthesis of carotenoids. The only transcript that clearly increases (of the genes tested) upon exposure of C. reinhardtii to VLFL is that of PDS, suggesting that control of this gene is at least in part a consequence of the action of a specific photoreceptor. Increases in HDS, PSY, and ZDS transcript levels were relatively small, although they do appear to be significant. In Arabidopsis, DXS, HDR (designated IDS in Table II and Fig. 2), and PSY transcripts increased during deetiolation of seedlings (Botella-Pavia et al., 2004). However, these experiments did not use the extremely low light intensities that were employed in the experiments presented in this article.

In C. reinhardtii, blue light elicits an increase in the level of GSA mRNA in cultures grown under conditions of light:dark synchronization (Matters and Beale, 1995; Herman et al., 1999). The signal transduction events involved in photoregulation of the GSA gene are proposed to include activation of a heterotrimeric G-protein and phospholipase C, an increase in cytosolic Ca2+ levels, and activation of calmodulin and calmodulin-kinase (Im et al., 1996; Im and Beale, 2000). Light regulation of GSA also requires that the cells have access to adequate carbon (acetate) and nitrogen sources in the growth medium (Im et al., 1996), suggesting that signals from other metabolic pathways influence expression of GSA and interact either directly or indirectly with light signals. Induction of GSA, ALAD, UROD1, CHLH1, CHLI1, and CHLI2 by VLFL suggests that these genes may have common regulatory mechanisms. One candidate for the photoreceptor involved in controlling genes encoding the enzymes responsible for Chl biosynthesis is PHOT1, a flavin-binding photoreceptor recently implicated in blue light induction of gametogenesis in C. reinhardtii (Huang et al., 2002). Preliminary studies using a PHOT1 RNAi strain suggest that VLFL-stimulated expression of GSA is under the control of PHOT1 (C.S. Im, C.F. Beck, and A.R. Grossman, unpublished data). While most transcripts encoding proteins involved in carotenoid biosynthesis (that we have tested) are not strongly regulated by VLFL, the PDS transcript shows a 3-fold increase after 2 h in VLFL, raising the possibility that PDS may also be under the control of PHOT1.

CONCLUDING REMARKS

The results discussed in this article demonstrate that genomic analyses of biosynthetic pathways in C. reinhardtii can reveal the occurrence of families of genes for a specific biosynthetic step in the pathway, phylogenetic relationships of the deduced protein sequences with those of other organisms, the sequences that target these proteins to the chloroplast, and the occurrence of specific conserved domains in plant and algal polypeptides not present in their cyanobacterial counterparts. The deduced amino acid sequences of proteins involved in Chl and carotenoid biosynthesis of C. reinhardtii also point to some intriguing differences among the algal, cyanobacteria, and vascular plant proteins with respect to both structure and regulation of activity (e.g. for the CAO gene). Furthermore, the genomic and cDNA information demonstrates that some of the enzymes involved in Chl biosynthesis are encoded by gene families in C. reinhardtii, and that differential regulation of specific members of these families may provide a mechanism by which the alga can acclimate to different light conditions, and perhaps to other environmental conditions. Additional light studies using specific wavelengths of light as well as high-intensity light, coupled with microarray analyses, are beginning, and will likely provide us with information on the role of specific photoreceptors and redox levels in the control of pigment biosynthesis in C. reinhardtii.

MATERIALS AND METHODS

Strains and Culture Condition

The Chlamydomonas reinhardtii wild-type strain (parental strain) CC124 was used for all experiments presented in this study. The cells were grown in Tris acetate phosphate medium at a moderate/low light intensity (45 μmol photon m−2 s−1) to a density of 5×105 cells mL−1, and then transferred to the dark for 24 h before exposure to light. Light treatments were performed with white LED (RL5-W6030; 6000mcd; Super Bright LEDs, St. Louis) at very low intensity (0.01 μmol photon m−2 s−1).

Gene Identification and Sequence Analysis

To identify specific genes in the C. reinhardtii genome, either the sequence information from GenBank for previously isolated and characterized genes, or the sequence information from other organisms (mainly Arabidopsis [Arabidopsis thaliana] and cyanobacteria) derived from cDNA or genomic DNA sequence information, was used to perform BLAST (Altschul et al., 1997) alignments against the C. reinhardtii genome sequence (http://genome.jgi-psf.org/chlre2) and EST database (http://www.chlamy.org/search.html). Candidate orthologous and paralogous predicted proteins were aligned with each other using ClustalW (Thompson et al., 1994). Alignments were manually improved and documented using BioEdit software (Hall, 1999).

Phylogenetic analyses were performed using the software packages Treecon (Van de Peer and De Wachter, 1994) and PHYLIP (Felsenstein, 1989). From the protein alignments, neighbor-joining trees based on Poisson-corrected distances were constructed in Treecon 1.3b, while maximum-likelihood analyses applying the JTT substitution matrix were performed with the PAML module of PHYLIP 3.61.

To determine EST frequencies of each gene, EST clones were identified by BLAST of C. reinhardtii EST entries in GenBank (as of August 2004) with putative full-length cDNA sequences of the respective genes. For calculation of ENC data, the codon usage frequencies of each ORF were analyzed with the program SPIN, part of the Staden package (Staden, 1996; http://staden.sourceforge.net/), and the results were imported into Microsoft EXCEL (Redmond, WA) and used to calculate ENC values according to Wright (1990) in a custom-made spreadsheet.

Sequencing of cDNA

For some genes, additional cDNA sequencing was performed. In the C. reinhardtii EST database, we identified full-length clones encoding putative CAO (AV626430), HDS (AV626792), LCYB (AV641959), and BKT (1024014H04) proteins, which were made available to us by the Kazusa DNA Research Institute (Chiba, Japan; Asamizu et al., 1999, 2000) and the Stanford Genome Technology Center (Stanford, CA; Shrager et al., 2003), respectively. In addition, we amplified a full-length sequence encoding CHYB from a recombinant λ ZAPII cDNA library using the Expand Long Template PCR system (Roche Applied Science, Indianapolis) as described elsewhere (Im and Grossman, 2002). Sequencing of cDNA and PCR fragments was performed on both strands using Big Dye sequencing reagents (Amersham Pharmacia Biotech, Piscataway, NJ); the fluorescent fragments were resolved on an ABI310 capillary sequenator. GenBank accessions for the full sequences are as follows: CAO, AY860816; HDS, AY860817; LCYB, AY860818; BKT, AY860820; and CHYB, AY860819.

RNA Isolation and RNA-Blot Analysis

Total RNA was isolated from cells using TRIZOL reagent (38% phenol, 0.8 m guanidine thiocyanate, 0.4 m ammonium thiocyanate, 0.1 m sodium acetate, pH 5, and 5% glycerol) containing 0.2 volumes of chloroform. The cells were lysed by suspension in the TRIZOL reagent, and nucleic acid in the aqueous layer was precipitated by adding 0.5 volumes of isopropanol, 0.5 volumes of 0.8 m sodium citrate/1.2 m NaCl. The RNA precipitate was allowed to form at 4°C for 4 h before it was pelleted by centrifugation at 10,000g for 30 min, washed with 70% ethanol, air-dried, and dissolved in sterile distilled water.

qPCR

Isolated total RNA was treated with RNase-free DNase I (Ambion, Austin, TX), followed by phenol:chloroform extraction. For cDNA synthesis, 1 μg of DNase I-treated total RNA was reverse transcribed and amplified using the Superscript II kit (Invitrogen, Carlsbad, CA), as described by the manufacturer. qPCR was performed using the DyNAmo Hot Star SYBR Green qPCR kit (MJ Research, Waltham, MA) and analyzed by the Opticon 2 real-time system (MJ Research). Cycling conditions included an initial incubation at 95°C for 10 s, followed by 40 cycles of 94°C for 10 s, 55°C for 15 s, and 72°C for 10 s. Each of the PCR assays was performed in triplicate. The relative expression ratio of target gene was calculated based on the 2−ΔΔCT method (Livak and Schmittgen, 2001). CBLP gene was used as control gene, and each primer was designed by Primer3 software (Rozen and Skaletsky, 2000; http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) to distinguish the different isozymes. Primers were designed to have a melting temperature between 58°C and 60°C, and an optimal length of 20 nucleotides. The GC content was held between 20% and 80% with no 3′ GC clamp. The target amplicon for each sequence was designed to be between 150 and 200 nucleotides with an optimal melting temperature of 85°C.

Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers AY860816 to AY860820.

Acknowledgments

We thank Dan Rokhsar and Diego Martinez at JGI and members of the Chlamydomonas Genome Consortium for helping to develop the tools and infrastructure for securing and examining C. reinhardtii cDNA and genomic information, and for providing stimulating discussions and valuable insights. The supply of EST clones by the Kazusa DNA Research Institute (Chiba, Japan) and the Stanford Genome Technology Center (Stanford, CA) is gratefully acknowledged. We also are grateful to two anonymous reviewers for critically reading the manuscript and for further suggestions.

Footnotes

↵1 This work was supported by the Deutsche Forschungsgemeinschaft (grant no. LO840/1–1 to M.L.). A.R.G. would like to thank the National Science Foundation for supporting genomic research using Chlamydomonas reinhardtii (grant no. MCB 0235878). C.-S.I. was supported by the National Science Foundation (grant no. IBN 0084189 awarded to A.R.G.).