OPERONS AND GENE CLUSTERS IN PROKARYOTES AND EUKARYOTES

The first gene cluster for a secondary metabolic pathway was discovered in maize (Zea mays) over a decade ago (Frey et al., 1997) and was regarded as something of an oddity. However clusters of genes for secondary metabolic pathways are now an emerging theme in plant biology, and are providing some provocative insights into plant genome plasticity and evolution. Gene clusters containing nonhomologous functionally related genes are common in bacterial genomes. Most are organized as operons, in which the different genes are expressed as a single polycistronic mRNA allowing the tight coupling of transcription and translation (Zheng et al., 2002; Rocha, 2008; Koonin, 2009). Operons, in turn, may be clustered. For example in actinomycetes, which produce a prolific array of pharmaceuticals and other high-value molecules, the genes for secondary metabolite production are clustered and typically encode several transcripts, some of which are bi- or polycistronic (Osbourn, 2010). True operons are rare in eukaryotes because transcription is uncoupled from translation and mRNAs are generally monocistronic; therefore, genes under common regulation are often dispersed throughout the genome and coordinately regulated in trans. Although there are clusters of functionally related genes in eukaryotes (Hurst et al., 2004; Sproul et al., 2005; Michalak, 2008), most of these consist of paralogs that have evolved by repeated tandem gene duplication and divergence (e.g. the globin and homeobox [Hox] loci in mammals; Osbourn and Field, 2009). In filamentous fungi, however, there are numerous exceptions to this rule, most notably gene clusters for secondary metabolic pathways. These include clusters for the synthesis of important pharmaceuticals such as the β-lactam antibiotics penicillin and cephalosporin, and for the production of toxins (e.g. aflatoxin and host-selective toxins associated with virulence on plants; Hoffmeister and Keller, 2007; Turgeon and Bushley, 2010). Other examples of clusters of nonhomologous functionally related genes in eukaryotes are the GAL and DAL gene clusters in the yeast Saccharomyces cerevisiae (which enable the utilization of Gal and allantoin, respectively; Hittinger et al., 2004; Wong and Wolfe, 2005) and the major histocompatibility complex in mammals (Horton et al., 2004). In general, gene clusters of this type appear to be required for growth or survival under certain environmental conditions and can therefore be regarded as adaptive gene clusters. In particular, they relate to the exploitation of new environments or the management of interactions with other organisms (Field and Osbourn, 2010).

THE QUESTIONS

The identification of five apparently unrelated secondary metabolic gene clusters in different plant species poses some interesting questions. These are discussed below.

How Common Are Secondary Metabolic Gene Clusters in Plants?

It is not yet known how common clusters of nonhomologous but functionally related genes are throughout the plant kingdom. This will be revealed through further analysis of fully sequenced plant genomes. The maize and rice gene clusters were defined using a combination of molecular biology, biochemistry, and reverse genetics (Frey et al., 1997, 2003, 2009; von Rad et al., 2001; Sakamoto et al., 2004; Wilderman et al., 2004; Shimura et al., 2007; Jonczyk et al., 2008; Swaminathan et al., 2009). Analysis of the two rice clusters was aided by the fact that expression of these gene clusters is elicitor inducible and by the availability of rice genome sequence information. The oat avenacin cluster was initially defined using forward genetics, facilitated by a simple screen for loss of root fluorescence (Papadopoulou et al., 1999; Qi et al., 2006; Mugford et al., 2009). The thalianol cluster was predicted by searching the genome sequence of Arabidopsis for candidate gene clusters containing triterpene synthase (oxidosqualene cyclase) signature genes and then validated by functional analysis (Field and Osbourn, 2008). Genome mining for new candidate secondary metabolic pathways based on clustering and coexpression has proved to be a highly successful approach in microbes (Zerikly and Challis, 2009; Osbourn, 2010). It is relatively straightforward to predict the types of gene that one might expect to find in a gene cluster for a secondary metabolic pathway (e.g. a gene for a signature enzyme such as a terpene synthase to make the backbone, and genes for tailoring enzymes such as cytochrome P450s and other oxidoreductases, acyltransferases, methyltransferases to make further modifications). Through systematic analysis of sequenced plant genomes we will now be able to discover new secondary metabolic gene clusters and to gain an idea of their frequency and distribution. With the growing body of genome sequence information that is now available for plants and the advent of next-generation sequencing it will be possible to search for candidate secondary metabolic gene clusters in a wide range of different species.

Gene clusters for synthesis of secondary metabolites in bacteria and filamentous fungi often (but not always) contain genes for pathway-specific transporters and regulators in addition to genes for the signature enzyme and tailoring enzymes (Osbourn, 2010). This merits some comment from the plant perspective. To date, transporters for the five characterized plant secondary metabolic gene clusters have not yet been identified. However the recent demonstration that Lr34, a gene that confers broad-spectrum disease resistance in wheat (Triticum aestivum), is predicted to encode an ATP-binding cassette transporter implicated in the transport of defense compounds is intriguing, given that the immediate neighbors of this gene in the wheat genome are of unknown function but are also implicated in secondary metabolism (e.g. sugar transferase and cytochrome P450 genes; Krattinger et al., 2009). So far the only transcription factor to be reported for plant secondary metabolic gene clusters is a positive regulator of both the momilactone and phytocassane clusters in rice (Okada et al., 2009). The gene for this cluster lies outside of both of the clusters that it regulates. Thus at present it is not clear how many features plant secondary metabolic gene clusters will share in common with microbial clusters.

What Is the Significance of Clustering?

Why should genes for some plant metabolic pathways be clustered while those for other pathways are not? What are the advantages associated with clustering? One explanation is regulation. Clearly dispersed genes can be coregulated through a common transcription factor, and there are some excellent examples of coordinate regulation of unlinked genes for metabolic pathways in plants (Martin et al., 2010). However, physical clustering of functionally related genes in eukaryotes has the potential to add another tier to the hierarchy of gene regulation, providing mechanisms for the coordinated regulation of gene expression at the levels of nuclear organization and/or chromatin (Hurst et al., 2004; Sproul et al., 2005; Osbourn and Field, 2009). In filamentous fungi the use of mutations and drugs that affect chromatin remodelling is proving to be a powerful means of pathway discovery, enabling cryptic clusters to be identified and activated (e.g. Bok et al., 2009; Cichewicz, 2010). Noncoding RNAs have also been implicated in the recruitment of chromatin complexes, and in animals Hox gene expression can be controlled posttranscriptionally and probably also epigenetically by noncoding RNAs and Polycomb group proteins (Yekta et al., 2008). There is evidence to indicate that regulation at the level of chromatin is likely to be important for expression of secondary metabolic gene clusters in plants. In silico data suggest chromatin-mediated regulation of the thalianol cluster in Arabidopsis (Field and Osbourn, 2008), while DNA fluorescence in situ experiments have revealed cell-type-specific chromatin decondensation of the avenacin cluster associated with gene expression in oat (Wegel et al., 2009). It is also possible that the grouping of genes into functional clusters in eukaryotes may facilitate the coordinated handling of transcripts that have arisen from physically linked genes, from transcription through processing and export to protein synthesis. A challenge going forward is to better understand how these gene clusters are regulated at multiple levels, from nuclear organization and chromatin remodelling to the synthesis and localization of functional pathway proteins.

The five known plant clusters all confer the ability to synthesize secondary products that are known or are likely to have protective roles in defense against pests and pathogens, and so presumably confer a selective advantage in nature. A second reason for the clustering of functionally related genes may involve selection for the coinheritance of favorable combinations of alleles at these multigene loci. Where the fitness of an allele at one locus depends on the genotype at another locus then a selective advantage may arise for genomic rearrangements that reduce the distance between the two loci (Nei, 1967). This ratcheting effect may be enhanced where the fitness of recombinant haplotypes is low, for example where the combination of a functional and nonfunctional allele at two loci results in the premature disruption of a biochemical pathway and accumulation of toxic intermediates (Nei, 2003). There is certainly evidence to suggest that this may be the case. For example, accumulation of an intermediate in the yeast DAL pathway as a consequence of a late pathway mutation results in toxicity (Wong and Wolfe, 2005), while accumulation of toxic intermediates in the triterpene pathways in Arabidopsis and in oat leads to substantial defects in growth and development (Field and Osbourn, 2008; Mylona et al., 2008; Fig. 1).

Detrimental effects of accumulation of toxic metabolic intermediates on plant growth and development. A, Accumulation of elevated levels of thalianol pathway intermediates in Arabidopsis results in moderate to severe dwarfing (adapted from Field and Osbourn, 2008). B, sad3 mutants of diploid oat accumulate an incompletely glucosylated avenacin pathway intermediate that is toxic, leading to stunted roots and a root hair deficiency phenotype. The top sections show cross sections of roots of wild-type and sad3 mutant seedlings stained with Calcofluor. Callose-containing aggregates were observed in the epidermal cells of the sad3 mutant but not in the wild type (confirmed using aniline blue). The accumulation of callose is likely to be a stress response triggered by accumulation of monodeglucosyl avenacin A-1. C, Cortex; E, epidermis. Scale bars = 50 μm. Adapted from Mylona et al. (2008).

POTENTIAL APPLICATIONS

Although the fundamental research described above delves into new territories, it is also important to consider a more practical issue—how can knowledge gained from the study of plant metabolic gene clusters be applied for the development of crops for agronomic and industrial end uses, with enhanced pest and disease resistance, improved nutritional qualities, or elevated levels of high-value products. Knowledge of gene clusters facilitates the delineation and functional analysis of cluster components and new metabolic pathways, as has been demonstrated on numerous occasions in bacteria such as the actinomycetes. Physical clustering of genes can also be exploited to introgress beneficial traits such as disease resistance into other plant varieties, or alternatively to breed out metabolic gene clusters associated with undesirable traits (such as bitterness and antifeedant activity). For those species that are less closely related, components of gene clusters can be transferred individually or in combination between species using recombinant DNA technology. The latter will be enhanced by the development of improved technology for the transfer of multiple genes, or perhaps even whole gene clusters, into plants of commercial and agronomic importance.

CONCLUSION

Secondary metabolic gene clusters are among the most diverse and rapidly evolving features of plant genomes. Using plant metabolic gene clusters as readouts for metabolic diversification and, by extension, genome plasticity, it will be possible to address novel and important questions in plant biology: How widespread are such clusters, why do they exist, and how do they form? An improved understanding of plant gene clusters will enable us to establish the rules behind this phenomenon—why are some metabolic pathways represented by clusters while others are represented by dispersed genes. This could ultimately cause us to reconsider our understanding of plant metabolism. It is also intriguing to consider whether there might be other types of nonhomologous gene clusters in plants that have functions other than in secondary metabolism. Last but by no means least, can we explain the formation of nonhomologous gene clusters based on our current knowledge of plant genome dynamics or do we need to invoke new mechanisms for rapid adaptive evolution?

Footnotes

↵1 This work was supported by the United Kingdom Biotechnology and Biological Sciences Research Council and the Engineering and Physical Sciences Research Council, the European Union, and the Branco Weiss Society in Science Fellowship Program.