The second hypothesis is the supposed interaction between parental S and heritability of cognitive ability. As above, the first study was published about 10 years ago, has a large number of citations (~1000) and the finding suits left-wing ideology. The sample size was pretty small for a modern behavioral genetic study at ~320 pairs. You can guess what happened, large follow-up studies found inconsistent results and finally someone did a meta-analysis which found no reliable overall effect. What it did was find an effect in a subsample of US studies and none in non-US samples. Unsurprisingly, the original study was a strong outlier:

In fact, we can go further. The meta-analysis dataset is public, so we can take a look at it and look for trouble. The authors used a fancy model that could take the hierarchical data structure into account (multiple outcomes per sample). I used a simpler method where I averaged the results within samples first and then meta-analyzed as normally (same approach as used in my study here). This gave me an overall effect size of .029 [CI95: -.015, .073], so pretty negligible. The authors’ more fancy method gave the same number. But then they pull a positive spin on it and analyze the data for US and non-US samples separately, which produces findings of .074 and -.027 using their method (.074 and -.041 using mine). Of this, the Turkheimer study had the largest effect size, .19 using my method. So the finding from that study is about 2.6 times larger than the overall US estimate.

What about publication bias? There aren’t enough studies to be very certain (k=17 after aggregation), but if one does run the correlation between standard error/sample size and effect size it’s negative: -.31/-.23, with large confidence intervals. So, an unsatisfactory maybe.

So did Turkheimer (2003) replicate? Well, if by replicate we mean: a later meta-analysis that analyzed an ad hoc subset of all studies produced an estimate with p<alpha that was ~2.6 smaller than the original study, then yes. Really, the word replicate is a dichotomous way of thinking about outcomes of studies. Another way to put it is that the original study produced a finding much larger than what other studies find and that we’re not even sure there is an overall effect that isn’t so close to zero we don’t care, even for the US. Possibly there is no finding here at all, possibly there are moderator effects and we will have to do 20 more large studies to find out. Business as usual. Sigh.

With regards to discussion of race differences, all this is fairly pointless because these shared environment effects go away with age and heritability does not seem to differ by SIREanyway. At least, based on those studies that did report heritabilities by SIRE.

Update 27th May, 2017

Some keep claiming that we should accept the interaction effect for the US subsample. Well, let’s do a quick test for publication bias in that too. Here’s the scatterplots for the total sample and the US-only subsample.

The patterns are negative, but not conclusively so due to the small number of studies. Still, it means that we should be more skeptical. The effect size is likely overestimated.