Comments to author

Remarks to the Author:

An increasing number of studies evaluates the genetic architecture of complex traits which underlines the importance of accurate heritability estimation. The manuscript by Speed et al. compares two models for heritability estimation: GCTA and their own method, LDAK. The main difference between the two models is the assumption of how the h2 contribution of variants relates to their average LD. Using data from a number of GWAS studies, the authors demonstrate that the LDAK model fits better for the majority of traits. In addition, they show lower average h2 contribution for variants of lower frequency and suggest weighting by imputation accuracy. The manuscript is well written and the problem and models are presented coherently. It is commendable that the authors add well annotated code which makes their results replicable and makes it more easy for other researchers to apply LDAK.

There has been a controversy over which of these heritability methods shows better performance (Speed et al., 2012, AJHG 91, 1011–1021; Speed et al., 2013, AJHG 93, 1151–1157; Yang et al., 2015, NG 47, 10). It is a known problem that GCTA makes strong assumptions about the distribution of h2 contributions across LD categories as well as minor allele frequency thresholds and that has led to the development of a modified version of GCTA that accounts for different contributions across the MAF and LD range by using genome partitioning (Yang et al., 2015, NG 47, 10).

Here the authors use empirical data to assess the fit of the different models which is an advantage over previous studies. They use genome partitioning to obtain estimates for categories of LD and compare this to overall h2 estimates. Moreover, they demonstrate that previous evaluations of model fit by simulation were biased by simulating data based on the target model’s assumption.

Major comments:

The approach which uses two partitions to assess model assumptions regarding LD (as shown in Figure 4) is not well explained. It is not clear how variants were assigned to either of the two categories for Partition I vs II. I am assuming there was an LD threshold that differs between P1 and P2 as a consequence of the different model assumptions but this should be explained in the text. Also, why were variants classified based on average LD in their region rather than their own LD score?

Neither the comparison using the two partitions nor the LRT seem to have included GCTA-LDMS. So how is it possible to evaluate whether GCTA-LDMS already sufficiently accounts for different h2 contributions by LD category and whether LDAK outperforms GCTA-LDMS or not? While LDAK seems to give higher estimates of h2 for most traits how can you rule out that these are over-estimates as claimed previously (Yang et al., 2015, NG 47(10))?

Most of the case control studies have less than 2000 cases while the UCLEB study has limited genotyping density (~200K genotyped variants). The latter leads to some doubt about the reliability of conclusions drawn for example with respect to rare variants (standard errors of the h2 estimates increase to >0.10 when including rare variants). Heritability estimates are not provided for GCTA-LDMS for quantitative traits due to convergence problems which might be related to insufficient numbers of variants of the Metabochip for some of the categories. Moreover, I cannot find the equivalent of figure 4 for quantitative traits. It is important to be able to compare the models for quantitative traits and sufficiently large data sets. Such resources are available (e.g. UKBiobank).

The assessment of enrichment by DHS is very interesting. Can this be added for other functional/regulatory variant classifications that are of high relevance (e.g. as done in Finucane et al., 2015, NG, 47(11))?

Relatives have been excluded from all data sets. This constitutes a loss of data. Others have proposed alternative methods to include both related and unrelated individuals (e.g., Zaitlen et al., 2013, PLOS Genetics, 9(5)). Would it be possible to implement this for LDAK?

It would be informative if the authors could provide some information on speed of LDAK and computational feasibility for large data sets.

Minor

• l194 presumable “on” is missing after impact

• The top of figure 4 (and the respective figures in suppl) has been cut off

• It would be good to add some UCLEB traits to table 1 and also GCTA-LDMS results

• The version of LDAK (v5 ) is not currently available (http://dougspeed.com/ldak/) which makes the assessment less easy