Abstract

Geneticists have historically used information on relatives of affected individuals to understand the genetic etiology of disease (e.g. pedigree linkage analysis, affected sib-pair analysis). Today, large population-based cohorts can be divided into cases and controls to study complex traits such as type 2 diabetes (T2D) through genome wide association studies (GWAS). These cohorts inherently contain unaffected relatives of disease cases (proxy-cases) that exist in the control group but carry genetic liability for disease. A recently published method by Liu et. al. introduces the concept of GWAS by proxy (GWAX): performing case-control genetic association studies using these unaffected first-degree relatives of cases in the (near) absence of true cases. We extend this concept to model genetic liability in large cohorts for which cases, proxy-cases, and controls are available. We used a linear mixed model to test relationship-to-case as a semi-continuous trait (F=1 for cases, F=0.5 for proxy-cases, and F=0 for controls).
We performed 1,000 simulation replicates to evaluate this model in an idealized cohort (N=100,000) with disease prevalence of 0.1 and heritability of disease liability of 0.1 (cases=10,000; proxy-cases=16,814; controls=73,186) where all proxy-cases have one affected relative and the sample-wide MAF is 0.11. The standard GWAS model compares cases to controls and includes proxy-cases, typically unidentified in a cohort, as controls. By simply removing proxy-cases from controls, we increase power from 54.0% to 70.6% for an odds ratio of 1.15. By modeling proxy-cases with cases and controls we increase power to 87.3%. This trend holds across similar odds ratios and common minor allele frequencies. Next, we used this statistical test to identify genetic variants associated with T2D in the Norwegian Nord-Trøndelag Health Study (HUNT). Using self-reported family history, we partitioned our sample of 69,635 European individuals into cases, proxy-cases, and controls. We tested for association in ~10 million genotyped and imputed genetic variants (MAF > 0.5%). By (i) removing proxy-cases from controls and (ii) modeling proxy-cases with cases and controls we increase evidence for association at known T2D risk loci as compared to standard GWAS. We are evaluating this method in the UK Biobank and continuing methods development to account for covariates and relatedness. With the increasing availability of biobank data, this work demonstrates the potential advantage of statistically modeling proxy-cases in a cohort-based GWAS.