The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes.

Gjini E, Haydon DT, Barry JD, Cobbold CA - Mol. Biol. Evol. (2012)

Bottom Line:
We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long.However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members.We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

Affiliation: School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom. egjini@igc.gulbenkian.pt

ABSTRACTPatterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

mss166-F4: The most likely conversion tracts from Model 4. The 15 high-identity VSG alignments are listed in the order (1,2), (1,3), (2,3) for each triplet. The bars refer to mismatches between nucleotides in the N-domains of the two sequences. The most likely conversion tracts (highlighted in yellow) were estimated by the “decoding” algorithm using the means of the posteriors in table 1. Between-conversion regions are given in blue.

Mentions:
The average conversion length predicted by all models is notably small, compared with the total length of the sequences analyzed. Model 1 predicts a mean imported tract length of 1/0.0387 ≈25 nucleotides, thus about 2.5% of the total gene length. This increases in Model 2, where for λend, the mean range was 0.0126–0.0836, implying more variable conversion lengths between 12 and 80 nucleotides. Model 3 fixes again the conversion tract length across triplets, with the mean estimated to be ∼21 nucleotides. By allowing conversion probabilities to vary across pairs, Model 4 supports even shorter gene conversions, ranging in mean from 14 to 25 nucleotides, with an average of approximately 18 nucleotides (fig. 4). Naturally, the assumption of the geometric distribution of imported tract lengths implies that the inferred conversions do vary in length within the same alignment and across alignments; however, a common feature remains a high mismatch frequency within conversions, which helps to distinguish converted regions from nonconverted regions.Fig. 4.

mss166-F4: The most likely conversion tracts from Model 4. The 15 high-identity VSG alignments are listed in the order (1,2), (1,3), (2,3) for each triplet. The bars refer to mismatches between nucleotides in the N-domains of the two sequences. The most likely conversion tracts (highlighted in yellow) were estimated by the “decoding” algorithm using the means of the posteriors in table 1. Between-conversion regions are given in blue.

Mentions:
The average conversion length predicted by all models is notably small, compared with the total length of the sequences analyzed. Model 1 predicts a mean imported tract length of 1/0.0387 ≈25 nucleotides, thus about 2.5% of the total gene length. This increases in Model 2, where for λend, the mean range was 0.0126–0.0836, implying more variable conversion lengths between 12 and 80 nucleotides. Model 3 fixes again the conversion tract length across triplets, with the mean estimated to be ∼21 nucleotides. By allowing conversion probabilities to vary across pairs, Model 4 supports even shorter gene conversions, ranging in mean from 14 to 25 nucleotides, with an average of approximately 18 nucleotides (fig. 4). Naturally, the assumption of the geometric distribution of imported tract lengths implies that the inferred conversions do vary in length within the same alignment and across alignments; however, a common feature remains a high mismatch frequency within conversions, which helps to distinguish converted regions from nonconverted regions.Fig. 4.

Bottom Line:
We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long.However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members.We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.

Affiliation:
School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom. egjini@igc.gulbenkian.pt

ABSTRACTPatterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair.