This is an Open Access article distributed under the terms of the Creative
Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted
non-commercial use, distribution, and reproduction in any medium, provided
the original work is properly cited.

Abstract

There is now considerable evidence supporting the view that codon usage is
frequently under selection for translational accuracy. There are, however,
multiple forms of inaccuracy (missense, premature termination, and frameshifting
errors) and pinpointing a particular error process behind apparently adaptive
mRNA anatomy is rarely straightforward. Understanding differences in the fitness
costs associated with different types of translational error can help us devise
critical tests that can implicate one error process to the exclusion of others.
To this end, we present a model that captures distinct features of frameshifting
cost and apply this to 641 prokaryotic genomes. We demonstrate that, although it
is commonly assumed that the ribosome encounters an off-frame stop codon soon
after the frameshift and costs of mis-elongation are therefore limited, genomes
with high GC content typically incur much larger per-error costs. We go on to
derive the prediction, unique to frameshifting errors, that differences in
translational robustness between the 5′ and 3′ ends of genes
should be less pronounced in genomes with higher GC content. This prediction we
show to be correct. Surprisingly, this does not mean that GC-rich organisms
necessarily carry a greater fitness burden as a consequence of accidental
frameshifting. Indeed, increased per-error costs are often more than
counterbalanced by lower predicted error rates owing to more diverse anticodon
repertoires in GC-rich genomes. We therefore propose that selection on tRNA
repertoires may operate to reduce frameshifting errors.

Introduction

A growing body of evidence supports the idea that codon usage patterns partially
reflect selection to avoid errors during translation (reviewed in Drummond and Wilke 2009). But what types of
error are being selected against and why? Misincorporation errors have arguably
received a lion’s share of recent attention but inserting the wrong amino
acid is by no means the only and perhaps not even the most common or costly mishap
that can occur during translation. For instance, the ribosome can also abandon the
nascent polypeptide before completion (drop-off, premature termination error) or
leave the correct reading frame and elongate the peptide chain based on nucleotide
triplets never meant to serve as a template for protein synthesis (frameshifting
error) (Parker 1989).

A failure to accurately decode the underlying codon lies at the heart of all of these
translational errors. Consequently, detecting biased usage of more efficiently
decoded (“translationally optimal”) synonymous codons is not in
itself sufficient to implicate a particular error process. To go beyond diagnosing
translational selection and attribute adaptive features of gene anatomy to specific
error processes, we need to develop critical tests that can implicate one type of
error to the exclusion of others. In this context, it is interesting to note that
different translational errors exhibit variation with regard to the fitness costs
involved (detailed below). Understanding and exploiting divergent cost dynamics
might therefore hold the key to devising critical tests and assessing the relative
evolutionary importance of different error processes.

Mistranslation events can be costly for a variety or reasons. Some cost models are
focused on the erroneous “product” and propose that errors are
deleterious because they abrogate function or because the mistranslated product
elicits dominant negative effects downstream of translation (Drummond and Wilke 2009). For example, mistranslated proteins
might misfold and, consequently, disrupt a variety of cellular processes, by
interacting promiscuously with other proteins and forming toxic aggregates (Drummond and Wilke 2008) or by occupying
quality control capacity (chaperones, proteases, etc.), thereby interfering with
normal protein homeostasis.

Other cost models are centered on the notion that the act of generating an erroneous
product can be costly in itself (“process cost”). Fitness costs
here may arise through nonproductive occupation of ribosomal capacity, which can be
rate limiting for growth (Shachrai et al.
2010) or through sequestration of other translational resources (amino acids,
tRNAs, etc.), which may prevent other proteins from being made in a timely fashion
(Stoebel et al. 2008). In addition, it
has been suggested that the energy wasted in futile synthesis and degradation may
constitute a relevant evolutionary cost (Wagner
2005, 2007).

One key prediction of error models that focus on process costs is that such costs
should strongly covary with the length of the erroneous product because residency
time at the ribosome, the level of resource sequestration and the amount of energy
wasted in protein synthesis and degradation should all increase with length. In line
with this prediction, Stoebel et al. (2008)
found, when they induced lac genes in a lactose-free environment
(i.e., expressing a protein without any functional benefit to the cell), that longer
genes were associated with greater costs.

The strong theoretical link between product length and process-related fitness cost
can inform strategies to pinpoint particular error processes behind adaptive codon
usage patterns because different translational errors have stereotypically different
effects on the length of the erroneous product. Misincorporation errors do not alter
the length of the polypeptide relative to the wild-type protein. In contrast,
premature termination errors lead to truncation of variable severity depending on
where along the mRNA the error occurs. This has led to the prediction that nonsense
errors should become increasingly more costly toward the 3′ end of the
mRNA and that, concomitantly, selection should be more powerful in promoting
accurate decoding toward the 3′ end (Eyre-Walker 1996). Consistent with this prediction, optimal codon usage
increases toward the 3′ end of coding sequences in Escherichia
coli (Qin et al. 2004; Stoletzki and Eyre-Walker 2007). Importantly,
this constitutes a critical test for translational selection against errors other
than missense errors because—unless missense errors also promote
drop-off—misincorporation errors do not predict a gradient in the leverage
of selection increasing toward the 3′ end of the mRNA.

In this study, we ask whether frameshifting errors show process cost dynamics that
discriminate them from other types of translational error and can thus help us gain
a better understanding of the role of frameshifting avoidance in shaping gene
anatomy.

Building on previous work (Huang et al.
2009), we present a simple quantitative model of frameshifting cost centered
on genome-specific tRNA concentrations and relative binding affinities. Comparing
process cost estimates across 641 prokaryotic genomes, we demonstrate that
frameshifting errors exhibit process cost dynamics that are different from both
missense and premature termination errors and can be exploited to establish support
for the hypothesis that selection against frameshifting at least in part explains
differential codon adaptation at the 5′ and 3′ termini of mRNAs.
Furthermore, our study highlights that comparative genomic estimates of the costs of
translational error can be highly misleading when mRNA sequences are considered in
isolation or with disregard to species-specific biology. This is principally because
there are strong interactions between process cost, GC content, tRNA repertoire, and
error rates that generate considerable variability in average expected frameshifting
costs across prokaryotic genomes.

Materials and Methods

Prokaryotic Genomes and tRNA Repertoires

We downloaded protein-coding sequences for 1,035 complete prokaryotic genomes
from the National Center for Biotechnology Information (NCBI)
(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) in February 2010.
Applying custom scripts, we filtered the data to limit analysis to genes with a
multiple of three nucleotides (n = 4 genes excluded
based on this criterion), without ambiguous nucleotides or internal in-frame
stop codons, and with a proper stop according to the relevant NCBI translation
table, either table 11 (TGA, TAG, TAA) or table 4 (TAG, TAA) where TGA is
decoded as tryptophan.

A Comparative Model for the Process Cost of Accidental Frameshifting

We suggest that the genomic process cost of accidental frameshifting
(CG) is approximated by

(1)

where pi is the probability that a
frameshift occurs at codon i (detailed below);
npre and npost are
the number of peptide bonds made before and after an error occurs at codon
i (fig. 1),
respectively; and ti is the number of times codon
i is translated.

Schematic representation of frameshifting cost.
npre (npost)
is the number of codons translated before (after) the frameshift occurs,
either in (A) the +1 or (B)
...

The model is nested so that we can obtain a per-gene estimate through summing
across the entirety of codons (L) in a given mRNA, and a
per-genome estimate through summing per-gene estimates across the entirety of
mRNAs (G). Below, we focus on average cost per site or per gene
because summing across all genes to determine genomic cost is likely to be
misleading in the absence of information on translation levels. Note that, for
the purpose of this analysis, we assume that every frameshifting error will
yield a completely nonfunctional product. Although there may be an argument that
functionality is more likely to be preserved when frameshifting occurs at the
3′ end of the mRNA, it is difficult to see how to systematically
discount costs in a biologically relevant manner without detailed, gene-specific
information on the impact of truncation and mis-elongation on functionality. In
addition, there is evidence that translational selection operates even at the
very 3′ end of mRNAs (Tuller et al.
2010), strongly suggesting that these regions are typically not
functionally dispensable.

We define npost as the number of codons translated
before the ribosome encounters the first off-frame stop codon or the coding
sequence ends. Note that in the latter case npost
represents a conservative estimate because the translation unit may not end at
the 3′ end of the coding sequence. This is particularly true for
bacteria, where mRNAs are often polycistronic (Sorek and Cossart 2010).

Modeling Site-Specific Frameshifting Probabilities
(pi)

Error propensity can differ considerably across sites and also depends on the
state of the translational machinery. For example, homomeric nucleotide runs
appear much more liable to frameshifting (Farabaugh 1996) than other sequence contexts, and there is ample
evidence that the composition of the cellular tRNA pool is a critical
determinant of decoding accuracy and, consequently, the propensity for
frameshifting. Increasing the concentration of a particular tRNA results in
reduced frameshifting frequencies at the corresponding codons (Atkins et al. 1979; Curran and Yarus 1989; Sipley and Goldman 1993). Conversely, codons matched by rare tRNAs
are particularly liable to frameshifting (Sipley and Goldman 1993; Farabaugh
and Björk 1999) and amino acid starvation can substantially
increase the likelihood of frameshifting at codons read by the affected tRNA
(Gallant and Lindsley 1992, 1993; Kolor et al. 1993).

Farabaugh and Björk (1999) have
suggested that tRNA-mRNA interactions at the ribosome can, in fact, provide a
unifying model to understand accidental frameshifts, where frameshifting
probability is principally a function of relative tRNA concentrations and
binding affinities. Briefly, the authors proposed that frameshifting can occur
when a near-cognate tRNA erroneously binds to the codon in the ribosomal A
site—more likely when there is a relative shortage of cognate
tRNAs—and, after translocation to the P site, the weak anticodon:codon
interaction permits downstream (+1) or upstream (–1) slippage
by one nucleotide if a sufficiently stable interaction can be formed in the new
reading frame. Huang et al. (2009)
recently presented a quantitative formulation of the Farabaugh and
Björk model, where the probability pi of
(+1) frameshifting at any one codon i is determined
as

(2)

where Vi+1 and Ri+1 are the sets of near-cognate tRNAs able and unable to slip one
nucleotide downstream, respectively; nt represents
tRNA gene copy number; ntci the number of cognate
tRNA genes of codon ci, and b a
positive constant <1, denoted “weak binding
coefficient” by Huang et al.
(2009), which models the fact that binding of near-cognate tRNAs is less
stable than binding of cognate tRNAs. For each genome, we derived
Vi and Ri for
all codon contexts based on a set of parsimonious anticodon:codon matching
strategies proposed by Grosjean et al.
(2010) (for details, see supplementary methods, Supplementary Material online).

The parameter pi captures an important aspect of
decoding accuracy, namely that error rate is intrinsically dependent on the
relative (rather than absolute) concentration of cognate, near-cognate, and
noncognate codons, so that it is critical to consider the diversity and relative
abundance of tRNAs to assess tRNA-dependent translation parameters (Fluitt et al. 2007).

Results

Individual Frameshifting Errors Are Typically More Costly in GC-Rich Genomes

Different types of translational error are associated with different
stereotypical process costs. Although premature termination errors incur costs
approximately proportional to the number of residues translated before the error
occurred (npre, fig.
1), frameshifting errors incur an additional cost
(npost) because the ribosome carries on
translating until it encounters an off-frame stop codon or the mRNA ends.

It is widely assumed that npost is typically small,
courtesy of a high chance of encountering an off-frame stop codon in the
immediate downstream neighborhood (Parker
1989; Farabaugh 1996; Farabaugh and Björk 1999; Itzkovitz and Alon 2007). Itzkovitz and Alon (2007) reported that,
for an “average” genome (uniform codon usage and amino acid
frequencies averaged over 134 genomes from all three kingdoms), the ribosome
encounters a fortuitous off-frame stop on average only 15 codons downstream of
the frameshifting error.

Figure 2A demonstrates
that this figure can be profoundly misleading when genomic GC content is high.
Standard stop codons (TGA, TAA, TAG) are AT-rich and the probability of
encountering AT-rich in-frame codons, required to specify the off-frame stop,
decreases with increasing GC content. This is all the more pronounced for
–1 frameshifts where a T at the 3rd codon position is required to
yield an off-frame stop (fig. 1). In
contrast to the first two codon positions, where A/T nucleotides may be required
to specify amino acid identity, GC variability is much more extreme at 3rd sites
(Muto and Osawa 1987) so that
encountering a 3rd site T in a high-GC genome is comparatively rare.

npost, however, only represents part of the process
cost of an individual frameshifting error because it ignores the number of amino
acids translated before the error occurred (npre).
To approach a more realistic estimate of average genome-specific frameshifting
cost, we computed npre +
npost for every codon in every gene. Results
suggest that GC-dependent differences in average cost between genomes might not
be as pronounced as suggested by npost considered in
isolation (fig. 2B).
This is principally because npost typically
contributes less than 20% (40%) of the total process cost
(npre +
npost) of +1 (−1) frameshifts
even at extremely high GC (supplementary fig. 1, Supplementary Material online). At the same time, average
npre varies across GC content only in as far as
proteins tend to be slightly longer on average in genomes with higher GC content
(supplementary fig. 2, Supplementary Material online).

GC-Rich Genomes Are Buffered Against +1 but not –1
Frameshifts

This reduction in between-genome variability notwithstanding, the average process
cost still appears to be higher in genomes with high GC content. But do GC-rich
genomes really shoulder a greater fitness burden in relation to frameshifting?
Clearly, that depends on whether any one particular error actually occurs and,
if so, how frequently. This is a function of the probability
pi that the error occurs at the focal codon
i, and the number of times that site is translated
(ti). Although by-gene estimates of
ti are not available for the vast majority
of genomes, we can derive relative frameshifting probabilities for every
possible codon context with reference to genome-specific tRNA competition at the
ribosome (see Materials and Methods). Incorporating genome- and context-specific
frameshifting probabilities into our model of process cost, we unexpectedly find
the positive correlation between GC content and average +1
frameshifting cost reversed (rho = −0.18,
P = 5.81 × 10−06,
fig. 2C). This is
despite protein length increasing slightly with GC content (linear regression
estimate of average protein length in genomes with 20% GC3: 256 amino acids [90%
GC3: 278], supplementary fig. 2, Supplementary Material online). The average cost of –1
frameshifts, however, remains highest for high-GC genomes (rho =
0.35, P = 2.21 ×
10−20). Considering only one prokaryotic species per genus
name to reduce phylogenetic nonindependence does not affect overall trends (data
not shown).

Why does incorporation of genome-specific frameshifting probabilities transform
the relationship between GC content and estimates of frameshifting cost?

Comparing pi for each minimal shifting context
(NNN|N for +1 shifts, N|NNN for –1
shifts) across genomes, we find that the majority of contexts exhibits a lower
propensity for frameshifting with increasing GC content (negative tau in fig. 3). The altered relationship between
GC content and cost is therefore not simply a function of different codon or
dicodon usage, that is, less shifting-prone motifs being used more frequently at
high GC content; systematic GC-linked changes in tRNA profiles must be a
contributing factor. Conspicuously, GC-rich genomes typically sport a more
diverse repertoire of anticodons (fig. 4,
Kanaya et al. 1999; Rocha 2004; Higgs and Ran 2008; Ran
and Higgs 2010). In particular, tRNAs with C or G in the first
anticodon position, which we would expect to bind most stably to G- and C-ending
codons, respectively, are typically present in high-GC genomes where G/C-ending
codons are common but frequently spared in medium- or low-GC genomes (fig. 5) where these codons are read via
wobble pairing with U in the first anticodon position. This is in line with
theoretical expectations about the diversity of tRNAs required for efficient
translation (Higgs and Ran 2008). We
suggest that, in addition, larger anticodon repertoires in high-GC genomes will
be selectively favorable as they reduce the burden of frameshifting error in
genomes vulnerable to incurring large per-error costs.

Anticodon sparing strategies as a function of GC content. The
“absence” of a particular anticodon (rows) from a
particular genome (columns, ordered by genomic GC3 content) is indicated
...

What these results highlight, above all, is that comparing translational cost
estimates between genomes will be misleading when sequence features are
considered in isolation because other critical parameters
(pi) can and do differ between genomes. In this
context, we realize that our empirical evaluation falls short of giving a
comprehensive comparative costing because we cannot at present incorporate
translation levels (ti). We are keenly aware that,
especially in fast-growing organisms, a large proportion of realized cost might
be incurred by a relatively small number of highly expressed genes so that
taking average cost across all sites (or even genes) might not adequately
reflect genomic fitness burden. Once comprehensive quantitative transcriptome
data becomes available for an extremely high-GC genome, it will be interesting
to incorporate this information into our model to derive genuinely comparative
genome-wide cost estimates.

A Critical Test for a Role of Frameshifting in Shaping Gene Anatomy: GC-Rich
Genomes Show Weaker 5′-3′ Gradients in Translational
Robustness

Above we hypothesized that, in addition to selection on translational efficiency
(Higgs and Ran 2008), increased
richness of the tRNA repertoire in GC-rich genomes might be at least in part an
adaptation to the comparatively larger per-error cost of frameshifting in these
genomes. Is there, however, any evidence consistent with frameshifting as an
important force in molecular evolution? Selection against premature termination
errors predicts a gradient in codon adaptation toward greater decoding accuracy
at the 3′ end of mRNAs, predicated on npre
as the principal process cost. But npre also
represents an important component of the process cost of frameshifting errors.
Does selection against frameshifting errors contribute to intragenic gradients
in codon adaptation?

The unique process cost dynamic of frameshifting errors, namely the existence of
a post-error cost (npost), allows us to test for
frameshifting involvement as follows: Consider an mRNA with very high GC
content. At the extreme, even slipping up right at the start of the message
leads to exactly the same cost as slipping up at the 3′ end because
the ribosome will never encounter an off-frame stop and therefore keep on
translating until the mRNA terminates. By implication, GC-rich genomes should
benefit relatively less from greater robustness (1−
pi) against frameshifting errors toward the
3′ end of genes. We therefore predict that, if frameshifting avoidance
is a relevant force determining heterogeneity in codon composition along the
mRNA, the difference in frameshifting robustness between 5′ and
3′ ends will decline with increasing GC content. In contrast,
selection against premature termination errors does not predict
5′-3′ differential robustness to ameliorate with rising GC
content.

Replicating Huang et al.’s
(2009) approach, we computed pairwise differentials in average
frameshifting robustness across the terminal 5′ and 3′ 100
codons. Note that this analysis is internally controlled so that we do not
expect differences in expression across genes and genomes to affect results. We
observe a clear-cut tendency toward less pronounced 5′-3′
differences with increasing GC content (fig.
6), supporting a role for selection against frameshifting errors. Results
are virtually identical when we exclude the first and last 30 codons, which are
likely under selection for translational regulation (Tuller et al. 2010; 30-codon cutoff conservatively
estimated from prokaryotic data in their fig. 2E and
F).

Differences in frameshifting robustness become less pronounced with
increasing GC3. Frameshifting robustness scores (FRSs) were computed as
1 − pi for each of the
...

May this trend simply be a consequence of codon choice becoming less flexible at
more extreme GC content? This would predict that differences in terminal
robustness should also decline toward the AT-biased end of the spectrum. This we
do not observe: We split the data into genomes with >50% GC and
<50% GC and confined analysis to genomes where the most AT-biased genome
was as far away from the 50% threshold as the most GC-biased genome (range
11–89% GC). We found significant positive relationships between GC3
and differential robustness for genomes with >50% GC (+1: rho
= 0.32, P = 1.91 ×
10−08; −1: rho = 0.19,
P = 0.0008, N = 302),
yet no significant negative trends for genomes with <50% GC (+1:
rho = −0.067, P = 0.24;
−1: rho = 0.014, P = 0.81,
N = 312). Moreover, an exponential fit
outperforms a quadratic fit (Akaike's Information Criterion:
−8,635 vs. –8,995) suggesting that a model that lacks an
increase toward the AT-biased end of the spectrum provides a better description
of the data.

Discussion

The simple model of frameshifting process cost presented above illustrates a number
of key issues relevant to assessing the role of frameshifting errors in shaping gene
anatomy.

First, the notion that npost is typically short is
misleading for genomes with high GC content.

Second, depending on the evolutionary question under consideration, arguments
concerning the likely costliness of frameshifting have been focused on either
npre or npost. But it is
important to acknowledge that frameshifting incurs a compound cost
(npre + npost),
which distinguishes this particular translational error from, for example, premature
termination or drop-off errors, which only incur npre.
Such differences in cost dynamics can be exploited to attribute signatures of
selection for translational accuracy to specific error classes. We explore these
differences in the context of process costs because the link between the length of
an erroneous polypeptide and its fitness cost should be linearly proportional. This
does not imply, however, that product cost is unrelated to length. In fact, it seems
likely that longer frameshifted tracts will on average also be less likely to be
soluble and, consequently, have a greater potential to be disruptive,
although—in contrast to process costs—the specific amino acid
context will be critically important in this regard. Thus, high-GC genomes are
likely faced with higher per-error product costs as well as process costs.

Third, comparative genomic analysis of frameshifting costs reveals that considering
mRNA sequences in isolation and ignoring vital differences in translational
machineries between genomes will produce a deceptive guide to fitness burden. In
order to arrive at a genuine comparative estimate of the selective leverage of
translational error, it will be imperative to incorporate differences in translation
levels between genes and genomes, but our results already highlight the importance
of differences in tRNA repertoire for relative susceptibility to translational
error. It is intriguing that systematic changes in tRNA repertoire with GC content
correlate with a reduction in the expected fitness burden related to frameshifting.
But does this imply that differences in tRNA repertoires represent selected
adaptations to reduce frameshifting costs or is anticodon diversity under selection
for other reasons, for example translational efficiency (Higgs and Ran 2008), and reduction in error rates constitutes
a fortuitous side effect? These two explanations are by no means mutually exclusive
and might assume different relative importance depending on the lifestyle of the
organism under consideration. For example, one would expect translational efficiency
to be relatively more important in r-selected species where fast growth is critical
for fitness. Fundamentally, the answer to this question will hinge on accurate
quantitative determination of fitness costs of erroneous versus slow protein
production.

Although our results clearly demonstrate that the link between process costs and GC
content is readily transformed by differences in the translational apparatus, more
concrete quantitative aspects of the current model should be interpreted with
caution. For example, is the higher cost of –1 frameshifting in high-GC
genomes real or rather an indication that the model does not incorporate an
important determinant of frameshifting dynamics? Although it is conceivable that GC
genomes find it intrinsically hard to reduce the cost of frameshifting errors and
therefore genuinely shoulder a greater fitness burden in relation to frameshifting,
it remains a distinct possibility that this cost is not actually incurred because
high-GC genomes exhibit certain (adaptive) features in cis or
trans which our model fails to capture. Notably, we adhere to
prokaryotic consensus rules for anticodon:codon interactions proposed by Grosjean et al. (2010) to model binding
stabilities and therefore propensities for frameshifting (see supplementary methods, Supplementary Material online), principally because this allows us
to compare cost estimates across genomes. These rules are inevitably generalizations
because decoding capacities cannot be perfectly predicted from sequence information
alone. Importantly, anticodon residues themselves as well as tRNA nucleotides
outside the anticodon loop can be posttranscriptionally modified in a variety of
ways, with marked effects on decoding capacity (Cochella and Green 2005; Daviter et al.
2006; Grosjean et al. 2010)
and/or translational fidelity (reviewed in Saks
and Conery 2007), explicitly including reading frame maintenance (Qian and Björk 1997; Björk et al. 1999; Herr et al. 1999; Urbonavicius et al. 2003). Decoding accuracy is further
affected by variation in other components of the translation machinery. This
includes nucleotide substitutions or modifications in ribosomal RNA, which can cause
more or less accurate decoding (Rodnina and
Wintermeyer 2001; Baxter-Roshek et al.
2007). In addition, differences in cellular environment, notably
Mg2+ ion concentrations (Gromadski and Rodnina 2004), can affect translation kinetics with
implications for accuracy, proofreading behavior, and anticodon:codon affinities.
Finally, we characterize accidental frameshifting as a local error, solely dependent
on interactions at the focal codon and its immediate upstream or downstream
neighbor. However, it is apparent from the analysis of programmed frameshifts that
downstream secondary structure (hairpins, pseudoknots, etc.) in particular can
dramatically affect the rates of shifting, probably at least in part by affecting
ribosomal progression and thus residency at a given site (Farabaugh 1996).

Despite these various simplifications and uncertainties, however, the results
presented here reinforce the notion that translational errors have been an important
force in shaping mRNA anatomy and further suggest that selection might have shaped
tRNA repertoires to reduce frameshifting errors.

Supplementary Material

Acknowledgments

The authors would like to thank Ed Feil and Eduardo Rocha for useful discussions and
two anonymous reviewers for their constructive comments on the manuscript. This work
was supported by a Medical Research Council Capacity Building Studentship to T.W.
and National Library of Medicine/National Institutes of Health intramural research
program to Y.H. and T.M.P.