31 Division of Endocrinology, Diabetes, and Nutrition and Program for Personalised and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, 685 Baltimore St. MSTF, Baltimore, 21201, USA.

31 Division of Endocrinology, Diabetes, and Nutrition and Program for Personalised and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, 685 Baltimore St. MSTF, Baltimore, 21201, USA.

Abstract

Homozygosity has long been associated with rare, often devastating, Mendelian disorders, and Darwin was one of the first to recognize that inbreeding reduces evolutionary fitness. However, the effect of the more distant parental relatedness that is common in modern human populations is less well understood. Genomic data now allow us to investigate the effects of homozygosity on traits of public health importance by observing contiguous homozygous segments (runs of homozygosity), which are inferred to be homozygous along their complete length. Given the low levels of genome-wide homozygosity prevalent in most human populations, information is required on very large numbers of people to provide sufficient power. Here we use runs of homozygosity to study 16 health-related quantitative traits in 354,224 individuals from 102 cohorts, and find statistically significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10(-300), 2.1 × 10(-6), 2.5 × 10(-10) and 1.8 × 10(-10), respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months' less education. Similar effect sizes were found across four continental groups and populations with different degrees of genome-wide homozygosity, providing evidence that homozygosity, rather than confounding, directly contributes to phenotypic variance. Contrary to earlier reports in substantially smaller samples, no evidence was seen of an influence of genome-wide homozygosity on blood pressure and low density lipoprotein cholesterol, or ten other cardio-metabolic traits. Since directional dominance is predicted for traits under directional evolutionary selection, this study provides evidence that increased stature and cognitive function have been positively selected in human evolution, whereas many important risk factors for late-onset complex diseases may not have been.

Figures

Extended Data Figure 1. Forest plot for…

Extended Data Figure 1. Forest plot for cognitive g

6

Individual sub-cohort estimates of effect size…

Extended Data Figure 1. Forest plot for cognitive g

Individual sub-cohort estimates of effect size and the standard error are plotted. Sub-cohorts are ordered from top to bottom according to their weight in the meta-analysis, so larger or more homozygous cohorts appear towards the top. The scale of beta FROH is in intra-sex standard deviations. The meta-analytical estimate is displayed at the bottom. Sub-cohort names follow the conventions detailed in Supplementary Table 6 and the Supplementary Table 11 legend. Sample sizes, effect sizes and P values for association are given in Table 1. This trait was rank transformed.

Extended Data Figure 2. Forest plot for…

Extended Data Figure 2. Forest plot for educational attainment

6

Individual sub-cohort estimates of effect size…

Extended Data Figure 2. Forest plot for educational attainment

Individual sub-cohort estimates of effect size and the standard error are plotted. Subcohorts are ordered from top to bottom according to their weight in the meta-analysis, so larger or more homozygous cohorts appear towards the top. The scale of beta FROH is in intra-sex standard deviations. The meta-analytical estimate is displayed at the bottom. Sub-cohort names follow the conventions detailed in Supplementary Table 6 and the Supplementary Table 11 legend. Sample sizes, effect sizes and P values for association are given in Table 1.

Extended Data Figure 3. Forest plot for…

Extended Data Figure 3. Forest plot for height

7

Individual sub-cohort estimates of effect size and…

Extended Data Figure 3. Forest plot for height

Individual sub-cohort estimates of effect size and the standard error are plotted. Subcohorts are ordered from top to bottom according to their weight in the meta-analysis, so larger or more homozygous cohorts appear towards the top. The scale of beta FROH is in intra-sex standard deviations. The meta-analytical estimate is displayed at the bottom. Sub-cohort names follow the conventions detailed in Supplementary Table 6 and the Supplementary Table 11 legend. Sample sizes, effect sizes and P values for association are given in Table 1.

Individual sub-cohort estimates of effect size and the standard error are plotted. Subcohorts are ordered from top to bottom according to their weight in the meta-analysis, so larger or more homozygous cohorts appear towards the top. The scale of beta FROH is in intra-sex standard deviations. The meta-analytical estimate is displayed at the bottom. Sub-cohort names follow the conventions detailed in Supplementary Table 6 and the Supplementary Table 11 legend. Sample sizes, effect sizes and P values for association are given in Table 1. This trait was rank transformed.

Extended Data Figure 5. Signals of directional dominance are robust to stratification by geography or demographic history or inclusion of educational attainment as covariate

(a) Cohorts are divided by continental biogeographic ancestry (African (15 sub-cohorts), East Asian (5), South & Central Asian (10), Hispanic (3)), with Europeans being divided into Finns (13), other European isolates (self-declared, 23), and (non-isolated) Europeans (90). Meta-analysis was carried out for all subsets with 2000 or more samples available. Sample numbers are as follows: cognitive g, Eur isolate 6638, European 44,153; educational attainment, African 4811, Eur isolate 8032, European 55,549, Finland 9068; height, African 21,500, E Asian 30,011, Eur isolate 23,116, European 228,813, Finland 30,427, Hispanic 5469, SC Asian 13,523; FEV1, African 6604, Eur isolate 4837, European 49,223, Finland 2340. βFROH is consistent across geography and in both isolates and more cosmopolitan populations. (b) Cohorts were divided into High and Low ROH strata of equal power and meta-analysis repeated – the effects are consistent across strata for all four traits. The mean SROH for the high and low strata are 13.4 and 4.3 Mb for cognitive g; 28.1 and 5.1 Mb for education attained; 31.9 and 10.8 Mb for height; and 41.4 and 4.5 Mb for FEV1. (c) To assess the potential for socio-economic confounding, where available, educational attainment was included in the regression model (edu) and compared to a model without educational attainment (none) in the same subset of cohorts. The signals reduce slightly when the education covariate is included; the analysis is not possible for educational attainment as a trait. For cognitive g, numbers are 36847 and 36023 for edu and none; for height 131,614 and 120,945; and for FEV1, 15717 and 15425. The numbers differ because of missing individual educational data within cohorts. + indicates phenotype was rank transformed. FEV1, forced expiratory lung volume in one second; g is the general cognitive component (first unrotated principal component of test scores across diverse tests of cognition); SC Asian is South & Central Asian, E Asian is East Asian, trait units are intra-sex standard deviations and the genomic measure is unpruned SROH.

Meta-analytical estimates of effect size and standard errors are plotted for various models. Fixed indicates no mixed modelling was used, gr res indicates the GRAMMAR+ residuals were fitted and hglm indicates the full hierarchical generalised linear mixed model was used. + indicates the phenotype was rank transformed; FEV1 is forced expiratory lung volume in one second; Cognitive g is the general cognitive factor. 15,355 subjects were used for cognitive g, 36,060 for educational attainment, 89,112 for height and 15,262 for FEV1.

In panels (a) – (c), X and Y axes show SROH (sum of runs of homozygosity) from 0-30 Mb (30,000 kb). ill370: Illumina CNV370, aff6: Affymetrix6, illomni: Illumina OmniExpress. The graphs are shown for the specific plink call parameters used. (d) Sample numbers per continent are presented in a bar chart. AFR: African, AMR: Mixed American, ASN: East Asian, EUR: European, SAN: South Asian. Only samples with SROH below 30 Mb are plotted, to be conservative to the effect of outliers, which have very strongly correlated estimates of SROH (r = 0.96-0.97 for comparisons including such very homozygous individuals). In these plots, the correlation between SROH called by the two arrays, r = 0.93-0.94.

Figure 1. Runs of Homozygosity by Cohort

Figure 1. Runs of Homozygosity by Cohort

8

The sum of runs of homozygosity (SROH) and…

Figure 1. Runs of Homozygosity by Cohort

The sum of runs of homozygosity (SROH) and the number of runs of homozygosity (NROH) are shown by sub-cohort. . Populations differ by an order of magnitude in their mean burden of ROH. There are clear differences by continent and population type both in the mean SROH, and the relationship between SROH and NROH.. SC.Asian is South & Central Asian, E.Asian is East Asian, Eur.Isolate is European isolates. The ten most homozygous cohorts are labelled: AMISH are the Old Order Amish from Lancaster County, Pennsylvania; HUTT, S-Leut Hutterites from South Dakota; NSPHS, North Swedish Population Health Study, 06 and 09 suffixes are different sampling years from different counties in Northern Sweden; OGP, Ogliastra Genetic Park, Sardinia, Italy; Talana is a particular village in the region; FVG, Friuli-Venezia-Giulia Genetic Park, Italy, omni and 370 suffices refer to subsets genotyped with the Illumina OmniX and 370CNV arrays; HELIC, Hellenic Isolates, Greece, from Pomak villages in Thrace, and CLHNS, Cebu Longitudinal Health and Nutrition Study in the Philippines.

Figure 2. Effects of genome-wide homozygosity, β…

Figure 2. Effects of genome-wide homozygosity, β FROH, on 16 traits

5

Four phenotypes show a significant…

Figure 2. Effects of genome-wide homozygosity, βFROH, on 16 traits

Four phenotypes show a significant effect of burden of ROH: height (145 sub-cohorts), FEV1 (34), educational attainment (47) and general cognitive ability, g (23). HDL and total cholesterol are not significantly different from zero after correcting for 16 tests and no effect is observed for the other traits. To account for the different numbers of males and females in cohorts and marked effect of sex on some traits, trait units are intra-sex standard deviations. βFROH is the estimated effect of FROH on the trait, where FROH is the ratio of the SROH to the total length of the genome. 95% confidence intervals (CIs) are also plotted. + indicates phenotype was rank transformed, * indicates phenotype was log transformed. BMI, body mass index; BP, blood pressure; FP fasting plasma; HbA1c, haemoglobin A1c (glycated haemoglobin); FEV1, forced expiratory volume in one second; FVC, forced vital capacity; HDL, high density lipoprotein; LDL, low density lipoprotein.