Abstract

Evolutionary changes in gene expression are a main driver of phenotypic evolution.
In yeast, genes that have rapidly diverged in expression are associated with particular
promoter features, including the presence of a TATA box, a nucleosome-covered promoter
and unstable tracts of tandem repeats. Here, we discuss how these promoter properties
may confer an inherent capacity for flexibility of expression.

Opinion

Early in research on the molecular basis of phenotypic variation the focus was primarily
on mutations in the coding regions (exons) of genes. But as first noted by King and
Wilson [1], substantial physiological differences can be seen between closely related species
despite almost identical sets of proteins, and it is now generally accepted that distinctions
between species are defined not only by their ensemble of genes but, critically, by
how those genes are regulated.

For example, dramatic differences in the body plan of related insects have been traced
to differences in the expression of developmentally regulated genes [2-4], and the classic example of variation in beak shape among Darwin's finches appears
to be controlled by variation in expression levels of the gene encoding Bmp4 [5]. Surveying 331 previously reported mutations underlying phenotypic changes, Stern
and Orgogozo [6] found that approximately 22% were regulatory changes, and the proportion of documented
regulatory changes is increasing annually and is even larger for inter-species differences.

More recent studies using advanced technologies, including microarrays or high-throughput
sequencing, have compared the genome-wide expression programs of related species [7-16] or strains [17-29] and revealed thousands of differences in the expression of orthologous genes. Identifying
the regulatory changes underlying specific expression differences has, however, been
more difficult: little progress has been made in connecting expression divergence
with regulatory sequence divergence, and the degree of sequence conservation at individual
promoters and regulatory elements cannot predict the degree of expression divergence
of the associated genes [30-34]. What has emerged is a more general distinction: some genes have a much greater propensity
to diverge in their expression than others. Here we discuss recent studies in yeast
on the promoter architectures underlying these differences, and how they may contribute
to the evolvability of gene expression. Yeast is an excellent model for studying the
evolution of gene expression because of its simplicity as a unicellular organism with
short and well-defined promoter regions, ease of genetic manipulation and a wealth
of functional genomics data.

The inherent capacity of genes for expression divergence

The notion that there are two kinds of promoters in yeast, with different functional
and architectural properties, was developed long ago by Struhl and colleagues, who
extensively studied the regulation of the adjacent yeast genes his3 and pet56 and suggested the presence of distinct core promoters that control constitutive versus
inducible gene expression [35]. More recent studies have shown that these distinctions correspond to distinct evolutionary
properties: whereas the expression of some genes has diverged between related yeasts
the expression of others has remained stable. Notably, this gene-specific tendency
is maintained in multiple studies comparing the genomic expression patterns of different
yeasts. Despite the fact that these studies were on different sets of yeast strains
or species grown in different environments, and that different quantities (expression
levels or ratios) were measured and different computational and experimental methods
used, their results show significant correlations: genes whose expression diverged
according to one study were often found to diverge in the other studies [36].

Moreover, these genes also preferentially diverged in expression in 'mutation accumulation'
experiments, where cells were allowed to accumulate mutations in conditions in which
the effects of natural selection were minimized [37]. Thus, we believe that expression divergence of these genes in multiple datasets
is not due to increased positive selection (or relaxation of purifying selection)
[38], but instead reflects an inherent capacity for expression divergence. This capacity
of a gene to evolve in expression can be quantified by measuring its 'expression divergence'
- that is, a mathematical quantification of how much the expression of a gene differs
among evolutionarily related yeast species or strains [36].

Expression divergence correlates strongly with gene responsiveness, namely the extent
by which a gene's expression is altered by the environment, and with expression noise
[39,40], namely the extent by which a gene's expression differs among genetically identical
cells [7,37]. That is, genes whose expression is strongly regulated between different conditions
display noisy expression and evolve rapidly between related strains or species. Thus,
it is possible that genes differ in their capacity for expression flexibility, which
is manifested at various timescales: during evolution in response to mutations; during
physiological responses to environmental changes; and within a population of cells
as a result of stochastic fluctuations.

TATA boxes, nucleosome-free regions and expression flexibility

The capacity for expression divergence (or flexibility) has been linked to several
characteristics of gene promoters. The simplest association is with the number of
binding sites for transcriptional regulators: promoters of flexible genes are characterized
by a relatively large number of binding sites [36,37]. This is perhaps not surprising, since the expression of genes with many regulators
(and binding sites) can be affected by mutations in any one of these regulators (or
promoter binding sites), thus increasing their mutational target size - that is, the
number of possible mutations that would affect the expression of these genes.

One particular promoter binding site stands out for its large influence on expression
divergence: promoters that contain a TATA box show a remarkable increase in expression
divergence, as well as in responsiveness and in noise [7,36,37]. The distinction between genes with promoters containing a TATA box and those without
stands when the number of transcriptional regulators or of promoter binding sites
is controlled; it is also consistent among genes from different functional classes
- for example, those encoding membrane proteins, genes encoding metabolic proteins,
and genes encoding ribosomal proteins (although these different groups also differ
widely in the proportion of genes with promoters containing TATA boxes) [7]. Strikingly, increased expression divergence of TATA-containing genes has been observed
in species ranging from yeast to mammals, including also mutation-accumulation lines
of yeasts, flies and worms [7,37], suggesting that it reflects a general phenomenon. Interestingly, the promoters of
TATA-containing genes are not associated with more mutations but only with increased
expression divergence [7]. Thus, we propose that promoters carrying a TATA box are inherently more sensitive
to genetic perturbations than TATA-less promoters. This is also consistent with the
distinction between constitutive and inducible genes and with previous studies that
demonstrated that a canonical TATA box is important for dynamic regulation of gene
expression whereas other sequence elements are important for maintaining constitutive
expression levels [35,41].

The TATA box is a ubiquitous core promoter element that is bound by the transcription
pre-initiation complex (PIC). What could cause increased expression divergence of
TATA promoters? Transcription can be considered as a two-step process: first the PIC
is recruited by transcription factors and assembles at the core promoter together
with RNA polymerase; and second, the polymerase is released from the PIC and transcribes
the gene. The second step can be repeated multiple times (re-initiation) if the PIC
remains bound to the core promoter, and this is believed to be facilitated by the
TATA box [42-44]. Thus, a TATA box could increase the extent of re-initiation, thereby amplifying
gene expression. Notably, the binding of the PIC to the TATA box and the binding of
transcription factors to other sites could be cooperative [44]. This would make the effect of the TATA box on gene expression nonlinear, as any
amplification of transcription factor binding would stabilize PIC binding and cause
a further increase in re-initiation. In this way, TATA-containing genes could be more
sensitive to regulatory mutations than TATA-less genes.

Importantly, TATA-containing promoters differ from other promoters not only in their
expression flexibility but also in other properties [45], and so it is possible that these secondary characteristics underlie their increased
expression flexibility. Perhaps the most notable feature of TATA promoters is their
atypical chromatin structure [46-48]. At most yeast promoters, the region directly upstream of the transcription start
site contains transcription factor binding sites and is nucleosome-free, increasing
the accessibility of the binding sites to transcriptional regulators [49] (Figure 1). By contrast, at promoters with high expression flexibility, and at those containing
a TATA-box, this region tends to be more occupied by nucleosomes (Figure 1). We and others have proposed that because nucleo somes are thought to interfere
with the binding of regulatory proteins, the regulation of nucleosome states might
fine tune the expression of these genes [46-48,50]. Such increased dependence on the regulation of chromatin structure is indeed observed:
promoters that are relatively more occupied by nucleosomes show relatively large changes
in expression when genes encoding chromatin regulators are mutated or deleted [48,51]. As with the effect of the number of transcription factors, an increased dependence
on chromatin regulators increases the mutational target size, affecting expression
of these genes. Any mutation in a gene encoding a relevant chromatin regulator, or
an upstream gene regulating the activity of the chromatin regulator, could affect
transcription of the downstream target gene.

Figure 1. Promoter architecture associated with expression flexibility [46-48]. Top: the architecture of a typical promoter in which nucleosomes are regularly positioned
but are excluded from a particular region upstream of the transcription start site.
This nucleosome-free region (NFR) contains accessible binding sites for (few) transcriptional
regulators (TF). Bottom: the architecture of promoters with high expression flexibility.
These promoters tend to have a TATA box and multiple other binding sites for transcriptional
regulators. Nucleosome positions are more dynamic (double-headed arrows) and nucleosomes
are not strongly excluded from any particular region, and therefore compete with transcriptional
regulators at their binding sites. These promoters are thus dependent on the activity
of multiple transcriptional regulators and chromatin regulators (CR), which increases
their mutational target size.

Unstable tandem repeats

So far we have discussed the role of promoter architecture in the sensitivity to mutations,
namely whether a mutation influences gene expression and to what extent. However,
expression divergence could also be directly facilitated by mechanisms that increase
the mutation rate(that is, the number of mutation events per unit of time) at particular
promoters. Although the determinants of local mutation rates are still poorly understood,
one property that has been shown to increase mutation rates is the presence of unstable
tandem repeats.

A recent study revealed that about 25% of all yeast promoters contain unstable tandem
repeats: short (1 to 150 nucleotide) stretches of DNA that are repeated head to tail
[52]. For example, TAG-TAG-TAG-TAG-TAG-TAG-TAG is a trinucleotide repeat, with the unit
TAG repeated seven times. Tandem repeats most often consist of short (2 to 6 nucleotide),
AT-rich units that are repeated 10 to 30 times, and occur frequently about 20 to 100
nucleotides upstream of the transcriptional start site.

The number of repeat units changes at frequencies that are typically 10- to 10,000-fold
higher than average point mutation frequencies. Changes in the number of repeat units
may cause gradual changes in transcription, with a certain number of units yielding
maximal transcription [52]. Thus, when tandem repeats occur within promoters, their inherent instability may
give rise to variants displaying altered levels of transcription, generating a pool
of phenotypic diversity that allows rapid divergence. The mechanism underlying repeat-based
expression divergence has been proposed to have its origins in chromatin structure.
AT-rich promoter repeats are known to influence local nucleosome positioning, and
changes in the number of repeats affect the density and positioning of nucleosomes
in the critical part of the promoter [52].

Expression divergence by cis and trans mutations

In contrast to divergence of coding regions, divergence of gene expression can originate
both from mutations in local DNA sequence (cis mutations) - for example, a mutation that affects a promoter binding site or nucleosome
position - and from mutations in other genes (trans mutations), such as those encoding transcription factors or chromatin regulators.
Thus, increased divergence in the expression of genes could be due to their sensitivity
to cis mutations or trans mutations or both. In some cases, such as variable repeat tracts, it is clear that
the effect depends on cis changes. However, in other cases, the relative contribution of cis and trans mutations is unclear. For example, an increased dependence on nucleosome positioning
could be due to cis mutations affecting nucleosome binding or to trans mutations affecting chromatin regulators.

Two approaches have been used to distinguish the effects of cis and trans mutations on gene expression on a genomic scale: genetical genomics [51,53] and analysis of hybrid species [15,54]. Results from both kinds of study suggest that divergence in the expression of flexible
genes is due chiefly to trans mutations [15,51]. For example, genes that diverged between Saccharomyces cerevisiae and Saccharomyces paradoxus as a result of trans mutations displayed high divergence in seven different studies comparing expression
of different S. cerevisiae strains or species [15]. In contrast, expression of genes that diverged by cis mutations displayed less divergence in the other seven studies. Furthermore, the presence
of a TATA box or of an occupied pattern of nucleosomes (Figure 1) was primarily associated with increased effects of trans mutations rather than cis mutations [15,51].

These results are consistent with a model in which increased flexibility of promoters
is due to increased dependence on trans factors (Figure 2). This could include both the number of factors that influence the expression of
a given gene (for example, a promoter occupied by nucleosomes is influenced by many
chromatin regulators) or the extent to which these factors influence expression (TATA
promoters, as well as occupied promoters, could be more sensitive to the binding of
transcriptional regulators). Accordingly, promoters with particular architectures
could be more tuned to the activity of various regulatory factors and thus more sensitive
to evolutionary changes in their activity. Notably, such promoters would also become
more sensitive to variation in the activity of these regulators through physiological
changes or stochastic fluctuations, which could explain the connection between expression
divergence, responsiveness and noise.

Figure 2. Expression flexibility, mediated by promoter architecture, may be due to increased
dependence on trans regulation and environmental changes. Genes with a TATA box, promoter occupied with
nucleosomes and many binding sites are regulated more extensively by regulatory factors.
These factors respond to extracellular signals, thus making the target genes responsive
to environmental changes both on short timescales (responsiveness and noise) as well
as on longer timescales (evolutionary changes). These flexible genes preferentially
code for proteins that interact with the environment and mediate the response to environmental
changes (curved arrow), and this may allow for rapid adaptation to new environments.

Promoter architecture and expression evolvability

Expression divergence is a major driver of evolutionary change and seems to be enriched
at particular genes. As described above, expression divergence in yeast correlates
with several promoter features, including a large number of binding sites, a TATA
box, an occupied pattern of promoter nucleosomes, increased dependence on chromatin
regulators and unstable tandem repeats. Notably, controlling for one of these factors
does not remove the effect of the others, suggesting that each of these factors have
an independent effect on expression divergence. Many of these factors seem to exert
their influence on expression divergence predominantly through trans effects, although others (for example, unstable repeats) involve cis effects.

As noted above, expression divergence (the extent to which expression of a gene evolves)
correlates with expression responsiveness (the extent to which expression of a gene
is changed in response to the environment). We believe that the promoter elements
discussed above underlie expression flexibility of these genes on short timescales
(responsiveness and noise), which are instrumental in the immediate response of a
cell to the environment, as well as on longer timescales (expression divergence),
which may allow evolutionary adaptation to novel conditions. In other words, the correlation
between responsiveness and expression divergence may be due to their dependence on
the same promoter properties.

The notion that responsive, inducible promoters differ from stable 'housekeeping'
promoters, established by Struhl and colleagues [43,55-59], has now been extended and linked to the evolvability of gene expression. However,
much is still unknown. For example, the protein-DNA and protein-protein interactions
that underlie the differential requirement of genes for general transcription factors,
as well as the implications of these interactions for the dynamics of gene regulation,
remain poorly understood.

The fact that promoter architecture correlates with expression evolvability (that
is, the readiness with which gene expression evolves) raises the possibility that
expression evolvability may be subject to selection. This could make it possible for
the expression of some genes to remain robust to mutation, whereas other genes are
inherently able to change rapidly in expression under evolutionary pressure. Consistent
with this, we find that different promoter elements that are independently linked
to expression evolvability preferentially coincide at the same genes, as if evolvability
were selected in these genes. In this context, it is interesting to note that the
group of rapidly diverging genes is enriched with plasma membrane genes and, in general,
genes that interact with the cell environment [7] (Figure 2). These genes are needed to cope with changes in the environment and their flexibility
may allow for rapid adaptation to new environments. Further studies will be required
to examine this hypothesis.

Acknowledgements

We apologize for omission of relevant references due to space restrictions. Research
in the lab of KJV is supported by the Human Frontier Science Program Award HFSP RGY79/2007,
FP7 ERC Starting Grant 241426, VIB, the KU Leuven Research Fund and the FWO-Odysseus
program. Research in the lab of NB is supported by the Helen and Martin Kimmel Award
for Innovative Investigations, the EU (FunSysB), the Israeli Ministry of Science and
the European Research Council (Ideas).

References

King MC, Wilson AC: Evolution at two levels in humans and chimpanzees.