The forbidden paper on the population genetics of IQ

I submitted a paper to Intelligence in December, 2015. After about three weeks, I received a rejection letter from the new editor (Richard Haier). What was particularly irritating about one of the reviewers was the recommendation to reject without opportunity for revision. In my opinion, this stance is justified only in extreme instances of fatal flaws, otherwise it just reveals a hidden agenda or a general close-minded attitude. My policy has been for some time to post the reviews of rejected papers, because I do not believe that reviews should be hidden. Transparency is very important, particularly in science. Let the general public decide whose arguments provide a better fit to the data, not the dismissive attitude of a reviewer. The reviews are attached in the appendix and the paper can be downloaded from here

This review was obviously written by an expert in the field, although it is not devoid of some generic comments that are irritating because they leave the question that they raise unanswered and they do not provide any references to back up their claims (e.g. “one research group with control of an extremely large family cohort is currently working on a manuscript documenting that years of education is subject to a very peculiar form of confounding”). Really? Which “peculiar form of confounding”? Which large family cohort and which manuscript? Which research group?

Isn’t it funny how a reviewer can afford to be generic and not provide any justification or references to back up their claims, but the authors have got to take extreme pains to make sure that everything is backed by sound evidence? Why this double standard?Perhaps because reviewers work for free, and nobody likes to do unpaid work.

There are a few serious comments that deserve consideration. For example: “A GWAS of Europeans is more likely to detect SNPs with high minor allele frequencies.The minor allele is usually the derived allele, and thus the use of SNPs ascertained to have low p-values in a GWAS of Europeans will lead to an overrepresentation of SNPs with high derived allele frequencies specifically in Europeans. If the derived allele tends to have a positive effect (as the authors claim), this is certainly an issue that needs to be carefully addressed.”

This argument is not explained very clearly. It’s another example of how the reviewers expect crystal-clear clarity from the authors, but they can get away with making rather obscure comments that leave room for different interpretations.

This could mean two things. 1) That the GWAS tends to select trait increasing alleles that are derived.However, upon closer inspection it turns out to be fallacious. There is the wrong assumption that the GWAS hits always have a positive beta, which is not the case. Positive and negative betas are randomly distributed across GWAS hits. Thus, when the GWAS selects the hit with a negative beta, which should be more likely to be the minor allele and hence derived, the allele with a positive beta (in this case, IQ enhancing) is going to be more likely to be the major allele and hence ancestral.

Nonetheless, I counted the number of derived alleles among the alleles increasing height in the latest and biggest GWAS meta-analysis of variation human stature. This would give us an estimate of the GWAS bias towards picking derived alleles

The derived to total allele count ratio is 370/691 or 53.5%. Assessing the statistical significance this result is problematic because many SNPs are in linkage disequilibrium and violate the assumption that they represent independent observations. It’s likely that this is just a statistical fluke but nonetheless, we can give the reviewer’s fallacious reasoning the benefit of the doubt.

The derived to total allele count ratio for intelligence enhancing alleles is 42/66 or 63.6%. Good news is that here we can apply binomial probability to calculate statistical significance because the SNPs were pruned for LD by the authors (Rietveld et al., 2014). We can see that the probability p that X(the number of derived alleles)>=42 is 0.0179.

However, to be fair we have got to include the knowledge acquired by the height GWAS and assume that there is a bias for derived alleles in the GWAS results. The best estimate of this bias is equal to the percentage of derived alleles in excess of 50% in the height meta-analysis, that is 3.5%.

A binomial calculation assuming a background frequency of 53.5% will yield a p value of 0.062, which is not extremely strong but not too shabby either.

However, the more likely interpretation of the reviewer’s comment is that the minor alleles picked by the GWAS tend to have higher frequencies among the GWAS reference population (i.e. Europeans) than the average genome-wide frequencies of minor alleles. Minor alleles are more likely to be derived alleles, hence these derived alleles will have higher frequencies among Europeans compared to other populations. Since derived alleles tend to have a positive effect, the frequency of alleles with positive effect will tend to be higher among Europeans than other populations. It was hard work translating the reviewers’ obscure words into an understandable sentence.

We can again give the reviewer benefit of the doubt and see if derived alleles with a positive effect have higher frequencies among Europeans compared to ancestral alleles with a positive effect and if their average frequencies are still correlated to population IQs.

It is indeed the case, as the reviewer had predicted, that derived alleles have a higher frequency among Europeans, whether they have a positive effect or not. But the question is: Are derived alleles with a positive effect better predictors of population IQ than derived alleles with a negative effect? If the alleles contain signal that goes above and beyond that produced by being derived, the correlation between derived positive and country IQ should be stronger than that between derived negative and country IQ. In other words, this would tell us that the GWAS found signal above and beyond that provided simply by (ancestral vs derived) allele status.

The correlation between DP (derived alleles with positive effect) and country IQ is r= 0.83. The correlation between country IQ and AP (ancestral alleles with positive effect) is r=-0.65.

This implies that the signal in the total polygenic score (average frequency of all derived and ancestral alleles together) is partly driven by the derived alleles. However, a closer inspection of the matrix will tell us that the correlation between derived alleles with negative effect and IQ is r=0.65, which is lower than that between derived alleles with positive effect and population IQ (r=0.83).

Clearly, more SNPs are required to validate this picture.

Let’s look at the hits found by Rietveld et al. To avoid post-hoc classifications, I employed the same that I used for the analysis in my paper. There were 10 genome-wide significant SNPs (p<5*5*10-8). However, 9/10 alleles with positive effect were derived so there were not enough ancestral positive alleles to make a comparison.The SNPs with a p value between 5*10-7 and 5*10-8) had a sample N= 99. We can see that derived and ancestral alleles are equally represented (DA:AA=49:50).

The same procedure applied to the Rietveld et al. (2014) SNPs to control for differential distribution of derived alleles due to GWAS artifact or bottleneck effects (Henn et al., 2015) will be employed here. Alleles with a positive effect are divided into two sub-groups: those that are derived and those that are ancestral. Reversing their frequencies (1-n) yields the frequencies of derived negative and ancestral negative alleles, respectively. These are shown in table 2.

First, we can see that the reviewer’s claim that derived alleles have higher frequencies among Europeans is debunked, as this is true only for derived alleles with a positive effect , but not those with a negative effect, which actually reach higher frequencies among South Asians (e.g. Indian Telegu: 0.458) but are otherwise equally distributed across Africans (e.g. Esan Nigeria: 0.372) and Europeans (e.g. British: 0.372). What is their correlation with population IQ? If GWAS hits really had higher frequencies among Europeans than Africans simply because (according to the reviewer) of a methodological artifact, this should apply irrespective of the effect on educational attainment. In other words, positive and negative effect derived alleles should be found at higher frequencies among Europeans. What about the polygenic scores correlations to population IQ? Again, if the polygenic scores’ correlation to population IQ were driven only by derived allele status, alleles with a positive effect on educational attainment should not be more strongly correlated to population IQ than alleles with a negative effect.

We can see that the correlation between derived positive polygenic score and IQ is 0.89, much higher than that between derived negative and IQ (-0.25). This suggests that the alleles pick selection signal that goes above and beyond random drift or effects of GWAS artifact. Another interesting result is that ancestral alleles with a positive effect do not seem to predict population IQ (r=0.25) confirming my prediction that intelligence enhancing alleles should be overrepresented among human-specific mutations. If we assume that human-specific mutations with a positive effect on IQ at the individual level are the least likely to contain false positives, we can consider this as the best measure of selection pressure strength across populations. We can see that this index peaks among Europeans (highest scores for Italians and British= 49%) and East Asians (e.g. Chinese Bejing= 44.6%). South Asians have lower scores (Bangladesh= 33.9%), and even lower in sub-Saharan African populations (around 30%).

Perhaps another measure of selection would be the difference between derived positive and derived negative (dp-dn) allele frequencies. This would take into account the DAF (derived allele frequencies) distributions due to population bottlenecks and drift. We can see that even this measure is substantially correlated to population IQ (r=0.85).With this methodology, it turns out that the (dp-dn) score for the Rietveld et al. (2014) 69 SNPs is weakly but negatively correlated to population IQ (r=-0.297).

Another way to validate a measure is to see how well it replicates across datasets: Are derived allele frequencies from one dataset correlated to derived allele frequencies in the other?What we are interested here is whether derived allele frequencies with a positive effect on intelligence have similar frequencies across datasets. If they do, this suggests that they are picking up more than random noise.

It turns out that the correlation between derived positive allele frequencies in the two datasets (Rietveld et al., 2013 and Rietveld et al., 2014) is positive (r= 0.88). On the other hand, the correlation between the derived negative alleles is near zero (r= 0.08). This suggests that alleles with a positive effect on IQ pick up selection signal, whereas the alleles with a negative effect on IQ represent noise. If these represented mere noise, then also the method of subtracting dn from dp would not be sound. Again, more data are needed to shed light on this issue.

A somewhat puzzling finding is the dramatic drop in the percentage of derived alleles with a positive effect when value goes above the conventional GWAS significance threshold (p<5*10–8). 9/10 of the GWAS significant hits were derived. However, only about 50% of those belonging to the second group (p value between 5*10-7 and 5*10-8) were derived. The dramatic drop is perhaps an artifact of adopting a dichotomous approach, dividing the groups by a conventional threshold. One would have to correlate the p value to the derived vs ancestral allele status. This was done in my paper using the 67 alleles found by Rietveld et al. (2014) to increase cognitive performance, and a slightly positive effect was found. Using the 109 SNPs (top 10 + 99 making up the second group), yields a correlation r= -0.019. Since derived alleles are coded as 1 and ancestral ones as 0, this implies that there is a very weak association between derived status and low p value. However, this is driven entirely by the top 10 SNPs. A limitation of this analysis is that the SNPs are not independent in LD, hence if there are clusters of SNPs around a certain p value, this will bias the derived allele count giving undue weight to alleles in that p value range. Bigger samples of SNPs pruned for LD will be required to replicate the association between derived status and positive effect found in the Rietveld et al. (2014) data set.

The reviewer thinks that the derived alleles are not necessarily enriched for intelligence enhancing signal and stated: “it is not necessarily the case that an association between derived status and a positive effect points toward selection increasing the mean of the trait. Such selection can actually lead to the opposite association (between derived status and a negative effect) at certain allele frequencies.”

I must confess that I do not understand this argument. Surely if a mutation unique to the human lineage (arisen after the most recent common ancestor of all living humans) had been detrimental, making humans less intelligent than primates, this would have been selected against, hence disappearing from the genome? Purifying selection is much more common than positive selection because random mutations are usually deleterious.

Selection increasing the mean of the trait does actually produce an increase in derived alleles when there has been positive directional selection for the trait in a species. We know that this is the case for humans, as cranial capacity and behavioral complexity has dramatically increased in the last 4 million years and modern humans are much more intelligent than non-human primates. Selection must necessarily have increased the intelligence-enhancing mutations, hence the derived alleles.

The reviewer’s argument would apply to height, as there has not really been increase in stature, at least from Homo Erectus to Homo Sapiens Sapiens. And that’s indeed what we found: height increasing alleles are only marginally enriched for derived alleles (53%), a finding that is likely a fluke.

Another comment worthy of consideration is this: “the extrapolation to non-European populations is still problematic because the accuracy of the polygenic score declines in such populations as a result of differing LD patterns (Scutari et al., 2015). “

Differences in LD should simply reduce the frequency differences at the tag SNPs between populations, compared to the real causal SNPs. This is due to a phenomenon called “attenuation”. Indeed, correction for attenuation is used “to rid a correlation coefficient from te weakening effect of measurement error (Jensen, 1998). This scenario works in the case that the frequency differences between tag and causal SNPs are due to random error, so that the mean frequency of the cognitive ability alleles is equal to the (genome-wide) background frequency (which for a mathematical reasons, is 50%). If instead there is a systematic bias, so that the mean frequency of the causal alleles is lower than the background frequency, then attenuation will reduce observed population-level frequency differences at tag alleles. As the reviewer says,” A GWAS of Europeans is more likely to detect SNPs with high minor allele frequencies”. Hence, the average frequency of at the causal alleles identified by the GWAS tends to be lower than 50%. That this is true, can be seen from the tables displaying the average frequencies of educational attainment increasing alleles, which tend to be much lower than 50%, especially at the lowest p values.

For example, let the average frequency of causal alleles be 40 % in the reference European population. We also know that the average genome-wide frequency of alleles is 50 % in all populations (the sum of two alleles is always 100). If LD breaks down at some loci so that the tag SNP is uncorrelated to the causal SNP, the tag SNP in the non-European population will have a bias towards higher frequency compared to that of the European population. Hence, differences in LD should cause non-European populations to have higher frequencies at the tag SNPs (that is, the “GWAS hits”) than European populations and to reduce frequency differences among these populations, as all of them tend to be closer to 50%.

So this is the opposite than what the reviewer said:“ Now suppose that in a different population the SNPs are uncorrelated, the reference allele at the causal SNP has a somewhat higher frequency, and the reference allele at the tag SNP has a much lower frequency. Then the inference made from comparing the polygenic scores of the two populations is exactly the opposite of the truth. “

The reviewer’s argument can perhaps apply to a single SNPs but there is no reason why there should be a systematic bias in the direction predicted by that argument and sadly the reviewer just assumes that this is so, without providing any justification.

Reviewer #1:I recommend the rejection of this manuscript without opportunity for revision. It does not meet the very high standards for demonstrations of natural selection acting to differentiate modern human populations that have been set by recent publications (Turchin et al., 2012; Robinson et al., 2015). Here I will only detail a few of the manuscript’s shortcomings.

The authors do not address the possibility that the GWAS results of Rietveld et al. (2013) are contaminated by confounding (cognition- or education-affecting environmental variables that happen to be correlated with genetic variation). Although the original publications of the SSGAC deal with this issue to some extent, they do not come up to the standards set in the papers that I have cited in the previous paragraph. Furthermore, one research group with control of an extremely large family cohort is currently working on a manuscript documenting that years of education is subject to a very peculiar form of confounding. Until these results are published and well absorbed, any naive inferences regarding the basis of racial differences should be regarded with skepticism.

The authors also do not address the issue of ascertainment bias. A GWAS of Europeans is more likely to detect SNPs with high minor allele frequencies. The minor allele is usually the derived allele, and thus the use of SNPs ascertained to have low p-values in a GWAS of Europeans will lead to an overrepresentation of SNPs with high derived allele frequencies specifically in Europeans. If the derived allele tends to have a positive effect (as the authors claim), this is certainly an issue that needs to be carefully addressed.

True, it may be that ascertainment bias is less of an issue when all SNPs regardless of p-value are used to construct a polygenic score. But the extrapolation to non-European populations is still problematic because the accuracy of the polygenic score declines in such populations as a result of differing LD patterns (Scutari et al., 2015). An example will make this clear. Suppose that two SNPs in perfect LD in Europeans have quantitatively close positive reference betas. Now suppose that in a different population the SNPs are uncorrelated, the reference allele at the causal SNP has a somewhat higher frequency, and the reference allele at the tag SNP has a much lower frequency. Then the inference made from comparing the polygenic scores of the two populations is exactly the opposite of the truth. We can conclude from this that the use of polygenic scores to infer the causes of intercontinental differences requires much more care than given to it here.

Because stabilizing selection (favoring the “golden mean,” as the authors put it) also eliminates genetic variation, higher dispersion of allele frequencies across populations is by itself not diagnostic of directional selection.

The fact that a large fraction of the enhancing alleles reported by Rietveld et al. (2014) SNPs are derived does not mean very much. First, as it is likely that many of the SNPs are not causal, the relationship between derived alleles at different polymorphic sites must be addressed. Second, even if it be assumed that these are the causal SNPs, it is not necessarily the case that an association between derived status and a positive effect points toward selection increasing the mean of the trait. Such selection can actually lead to the opposite association (between derived status and a negative effect) at certain allele frequencies.

A general comment is that the appropriateness of much of the hypothesis testing in this paper is difficult to judge. The stochastic model justifying a particular statistical test is usually unclear. Is the source of randomness inaccuracy in the GWAS estimates? The inherent stochasticity of evolution?

Reviewer #2: I found this paper extremely reader-unfirendly. Starting from the title – which is cumbersome, to the tables – which lack meaningful explanations and notes, to references to specialist concepts – that require much further explanaion for non-expert reader, to the general structure of the write up – the paper needs extensive revisions before it can be considered for publication for Intelligence.

The paper is full of poorly justified conlusions. For example, in the abstract, the author claims: ‘Cognitive-enhancing SNPs were significantly enriched for derived alleles

(64%), that is human-specific mutations that originated after the split from the most recent common ancestor between humans and other primates.’ However, the Derived vs ancestral alleles section on page 9 does not present the releavant analyses in details, and therefore the conclusion is not justified.

The paper is full of sentences that would require further clarifications for non-expert audience. For exampole: ‘Differences in allele frequencies between populations can be created by directional selection when the strength and/or direction of selection on the phenotype differs among populations. In this case it is also characterized as diversifying selection, in contrast to stabilizing selection which tends to favor the “golden mean.”

OR

‘Diversifying selection is most commonly measured using the Fst index at or around single loci (Holsinger & Weir, 2009).’ This needs to be expained further.

OR

‘Some SNPs had opposite betas on the two outcome variables (yes/no college completion and total years of education).’ This requires further discussion.

I could go on giving examples of unclear sentences, but I believe that the paper needs to be worked on- the author should consult with non-expert (in this specific area) intelligence researchers – to arrive at a clearer, more streamlined and better explained manuscript. All analyses require futher explanations, perhaps, with specific examples, that would talk the reader through every step of the analyses.