Perfect and one-off uptake sequences

When the H. influenzae genome was first sequenced. Ham Smith did a detailed analysis of the frequency, distribution and variation of uptake signal sequences (USS). Here's a link to the abstract. Previously we only had an approximation of the core consensus (AAGTGCGGT and complement) and very rough estimates of copy number.

One striking thing he found was a disparity between the numbers of 'perfect' and imperfect (we call them 'one-off') USSs. Although the genome has 1571 copies of the 'perfect' sequence where only 8 were expected for a random sequence of the bame base compostion, it has only about 750 one-off USSs. This is surprising for two reasons. First, there are 27 different mutational changes that turn the perfect USS into an one-off (3 at each of the 9 USS positions), so we would naively expect to see lots more one-offs than perfects. Second, USS are thought to function by binding to a protein receptor on the cell surface. When the binding sites of other DNA-binding proteins (e.g. repressors of transcription) have been compared, the consensus is usually much weaker, with most real sites being imperfectly matched to the consensus. So for years I've been wondering whether the scarcity of one-offs is telling us something important about USS evolution and function.

At a conference we both attended when the H. influenzae genome sequence was about to be released (it was the first genome to be sequenced so it was a big deal) Ham mentioned that the ratio of perfects to one-offs could be used to infer the relationship between mutation and the sequence bias of the uptake machinery. I went home and formalized this in some math (really not much more than arithmetic), which seemed to show that the observed ratio of perfects to one-offs predicted a specific ratio of mutation rate to uptake bias. But my model was so simplistic that I didn't take it very seriously.

But yesterday the two post-docs and I got together and discussed how we're going to turn last year's computer simulation USS model into a better model and a paper (see Sunday Aug. 6 post). Later I was reading over notes I'd made last summer on the results I had then, and found a paragraph on the frequency of one-offs that agrees with my simple arithmetical model and probably explains why real one-offs are scarce.

In the computer model, accumulation of USSs in the simulated genome depends on two factors. The uptake machinery must be biased in favour of perfect USS over random sequences; in the model this is simulated by occasionally replacing a one-off USS with a perfect USS. But random mutational changes in the genome are also essential. Otherwise the simulation only creates perfect USSs from the few pre-existing one-off USSs in the original random genome sequence. New mutations are needed to create new one-offs from preexisting two-offs, and to create new two-offs, etc.

But random mutations also happen to the perfect USSs that the uptake bias has created, turning them back into one-offs. Depending on how strong the bias is, these can then be re-replaceded with perfect USSs, or undergo further mutational degeneration. If the mutation rate in the computer model is set too high relative to the uptake bias, degeneration will dominate the process, so that one-off USSs are more likely to mutate into two-offs than to be converted into perfects. In these situations, perfect USSs will not accumulate at all; they will remain at the very low frequency expected for a random sequence of that base composition.

On the other hand, if the bias is stronger than the mutation rate, one-off USSs will usually be converted into perfects shortly after they arise.

What this means is that one-off USSs will only accumulate within a narrow range of mutation/bias ratios. If the ratio is lower than the critical range, perfect USSs will accumulate but one-offs will be scarce because they're rapidly converted to perfects. If the ratio is higher than the critical range, perfect USSs won't accumulate at all. I was about to write "and neither will one-offs", but I'm not sure about that. I could imagine a situation where the ratio was not quite low enough to allow perfect USSs to accumulate, but was low enough to cause significant accumulation of one-off USSs. That will be easy to check, once the model is running again.

So this explains why one-off USSs are so scarce in real genomes that have lots of perfect USSs. Nice.