Abstract

Quantifying the number of selective sweeps and their combined effects on genomic diversity in humans and other great apes is notoriously difficult. Here we address the question using a comparative approach to contrast diversity patterns according to the distance from genes in all great ape taxa. The extent of diversity reduction near genes compared with the rest of intergenic sequences is greater in a species with larger effective population size. Also, the maximum distance from genes at which the diversity reduction is observed is larger in species with large effective population size. In Sumatran orangutans, the overall genomic diversity is ∼30% smaller than diversity levels far from genes, whereas this reduction is only 9% in humans. We show by simulation that selection against deleterious mutations in the form of background selection is not expected to cause these differences in diversity among species. Instead, selective sweeps caused by positive selection can reduce diversity level more severely in a large population if there is a higher number of selective sweeps per unit time. We discuss what can cause such a correlation, including the possibility that more frequent sweeps in larger populations are due to a shorter waiting time for the right mutations to arise.

Reduction of diversity around genes. Plots show the relationship between the nucleotide diversity, π, and the physical distance to the nearest genes for each genus of the great apes. (A) For distances up to 1 Mb. (B) For distances up to 200 kb. Error bars show 95% confidence intervals calculated from bootstrapping with 1,000 replicates from 1-Mb windows. The name of each (sub) species is shown at Bottom.

Reduction of diversity as a function of genomic diversity. (A) Relationship between diversity estimated from genomic regions far (>823 kb) from genes (πFar1) and diversity ratio of the rest of the genome to the genomic region far from genes (πRest/πFar2). All positions far from genes are randomly assigned into two groups used to calculate πFar1 and πFar2. Error bars indicate 95% confidence intervals for this randomization. (B) Relationship between the distance from genes and the coefficient of variance (CV) of π. Error bars show 95% confidence intervals calculated by 1,000 bootstrapping replicates from 1-Mb windows.

Signatures of selective sweeps. (A) Relationship of minor allele frequency and the distance from the nearest gene for each genus of the great apes. The name of each (sub) species is shown at Bottom. Significant positive correlations were observed from Western gorillas (Bonferroni-adjusted P value <0.0001), Nigeria–Cameroon chimpanzees (adjusted P value <0.0001), western chimpanzees (adjusted P value <0.0001), and Bornean orangutans (adjusted P value <0.0001). (B) Relationship of population divergence levels with the distance from the nearest genes. Spearman’s correlation coefficient and the significance are shown in each panel (***, **, *, and ns indicate Bonferroni-corrected P values with <0.001, <0.01, <0.05, and ≥0.05, respectively). Error bars show 95% confidence intervals calculated by 1,000 bootstrapping replicates from 1-Mb windows.

Simulations of background selection. The diversity pattern in the sequences flanking the genic region under evolutionary constraint for two populations in which N = 1,000 and 2,000, respectively, is shown. Selection coefficients range from 0.0001 to 0.05. For each set of parameters, we performed 1,000 independent simulations and report the average π. (A) Relationship between the distance from genes and the relative reduction in π due to background selection, πBGS, together with theoretical expectation (based on ref. ) (black lines). Blue and red points represent when n = 1,000 and n = 2,000, respectively. (B) Relationship between the distance from genes and the ratio of πBGS with n = 2,000 to πBGS with n = 1,000. Pearson’s correlation coefficient and the significance are shown in each panel (***, **, *, and ns indicate Bonferroni-corrected P values with <0.001, <0.01, <0.05, and ≥0.05, respectively).

Simulations of selective sweeps. The selection coefficients of adaptive mutations, s, are 0.01 and 0.02; the proportion of all genic mutations that are beneficial (p) is 0.0005, 0.001, and 0.002. For each set of parameters, we performed 1,000 independent simulations and report the average π. (A) πSWP, diversity reduction due to selective sweeps, as a function of distance from genes for different combinations of N, s, and p. (B) The ratio of πSWP with n = 2,000 to π with n = 1,000 as a function of distance from genes. Pearson’s correlation coefficient and the significance are shown in each panel (***, **, *, and ns indicate Bonferroni-corrected P values with <0.001, <0.01, <0.05, and ≥0.05, respectively). (C) πSWP as a function of distance from genes when the number of beneficial mutations per generation is the same but N differs by a factor of two.