A Genome-Wide Association Study on Obesity and Obesity-Related Traits

Affiliations:
Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America,
Zilkha Neurogenetic Institute, Department of Psychiatry and Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America

Affiliations:
Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America,
Department of Pediatrics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America

Affiliations:
Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America,
Department of Pediatrics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America

Figures

Abstract

Large-scale genome-wide association studies (GWAS) have identified many loci associated with body mass index (BMI), but few studies focused on obesity as a binary trait. Here we report the results of a GWAS and candidate SNP genotyping study of obesity, including extremely obese cases and never overweight controls as well as families segregating extreme obesity and thinness. We first performed a GWAS on 520 cases (BMI>35 kg/m2) and 540 control subjects (BMI<25 kg/m2), on measures of obesity and obesity-related traits. We subsequently followed up obesity-associated signals by genotyping the top ~500 SNPs from GWAS in the combined sample of cases, controls and family members totaling 2,256 individuals. For the binary trait of obesity, we found 16 genome-wide significant signals within the FTO gene (strongest signal at rs17817449, P = 2.5×10−12). We next examined obesity-related quantitative traits (such as total body weight, waist circumference and waist to hip ratio), and detected genome-wide significant signals between waist to hip ratio and NRXN3 (rs11624704, P = 2.67×10−9), previously associated with body weight and fat distribution. Our study demonstrated how a relatively small sample ascertained through extreme phenotypes can detect genuine associations in a GWAS.

Funding: This work was supported in part by NIH grants R01DK44073, R01DK56210, and R01DK076023 to R.A.P. and a Scientist Development Grant (0630188N) from the American Heart Association to W.D.L.. Genome-wide genotyping was funded in part by an Institutional Development Award to the Center for Applied Genomics (H.H.) from the Children's Hospital of Philadelphia. No additional external funding was received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Obesity is the sixth most important risk factor contributing to the overall burden of disease worldwide [1]. Affected subjects have reduced life expectancy, and they suffer from several adverse consequences such as cardiovascular disease, type 2 diabetes and several cancers. Many studies have shown that body weight and obesity are strongly influenced by genetic factors, with heritability estimates in the range of 65–80% [2], [3]. Genetic variants in several genes are known to influence BMI, but these mutations are rare and often cause severe monogenic syndromes with obesity [4]. With the development of high-throughput genotyping techniques and the implementation of genome-wide association studies (GWAS), common variations, such as those in FTO[5] and MC4R[6], have been associated with obesity and body mass index (BMI). Recent large-scale meta-analysis of multiple GWAS identified additional genes harboring common SNPs that associate with BMI [7]–[10]. GWASs have also found associations with measures of body fat distribution [9], [11], [12]. By far the largest GWAS to date included almost 250 thousand individuals and 2.8 million SNPs [13]. Associations of BMI with 28 loci reached genome wide significance, including 10 that were reported previously and 18 that were newly identified. Four additional loci were associated with body fat distribution, all of which had been identified previously. However, even this major expansion of sample size has not explained much variation, 1.39% for BMI and 0.16% for body fat distribution. On the other hand, confirmation of existing BMI loci, and detailed analysis on their association with obesity as a binary trait and with other obesity-related quantitative traits, are important at the current stage to move GWAS signals forward and understand their functional consequences.

A few studies utilized samples with early-onset or morbid obesity for discovery, and replicated previously reported association signals on BMI [7], [14]–[16], or implicated specific genetic variants such as a recurrent 16p11.2 deletion [17]. Utilizing extreme phenotypes increases the odds ratio of association, with improved power to identify novel association signals under fixed genotyping budgets and fixed sample sizes. We have collected a large cohort of obese cases and families ascertained from tails of BMI distribution together with detailed phenotype measures on multiple obesity-related traits. In addition, we have adult controls who have never been overweight. However, given fixed genotyping budget, instead of genotyping all these samples by whole-genome SNP arrays, we elected to perform a case-control GWAS, and then follow up the top signals by candidate SNP genotyping on the entire set of samples including family members. Therefore, the unique dataset provides an opportunity to examine GWAS associations in a combined sample of cases, family members, and controls.

Results

We analyzed genotype data for 520 cases and 540 control subjects, and performed a GWAS on obesity as a binary trait. We observed a strong association of obesity to the FTO gene, with the most significantly associated marker being rs3751812 (P = 2.01×10−8, odds ratio = 1.64). All association signals with P<10−5 are shown in Table 3, and the Manhattan plot is shown in Figure 1. No additional loci with genome-wide significance were identified in the GWAS; nevertheless, the fact that FTO readily reached genome-wide significance in a small data set confirmed the high quality of the phenotypes within the sample collection. It also illustrated how a small sample ascertained from extreme phenotypes have high power to detect genuinely associated genes, compared to quantitative trait association analysis conducted on population-based samples.

Considering the possibility that some obesity-associated genes may be enriched among the top ranked genes in GWAS, we next followed up selected association signals (~500 top SNPs) by iSelect genotyping on 2,256 cases, family members and controls. Additionally, we also performed dense genotyping of 49 SNPs in the FTO gene itself, including 7 from the original top 500 and 42 spanning the entire gene. We used the MQLS software for the association analysis, to account for the familial relationships. Interestingly, the significance for FTO SNPs increased by several orders of magnitude (P-values range from 10−9 to 10−12), suggesting that genotyping additional family members increased power to detect genuine associations (Table 4).

We next completed exploratory analyses of quantitative measures of obesity. In part because of the extreme bimodality of the phenotype distributions based on sample ascertainment, we controlled for case/control status. Moreover, this approach makes it possible to assess the potential effects of genes on the extent of obesity in extremely obese individuals. The complete set of results (P<1×10−6) were given in Table 5. We note that a few markers reached P<5×10−8, notably waist circumference (chromosome 21, rs11088859, p = 3.75×10−8, nearest gene NCAM2), and waist to hip ratio (chromosome 14, rs11624704, p = 2.67×10−9, nearest gene NRXN3), but the NCAM2 locus cannot be regarded as genome-wide significant considering the need to adjust for multiple phenotypes being tested. Therefore, these exploratory results were provided as potentially interesting findings worthy of additional replication efforts.

Table S1 summarizes our results with respect to previously reported associations with obesity related traits. As noted, only FTO and NRXN3 reached genome wide significance. Three genes, including SH2B1, MC4R and KCTD15, showed trends towards significance in the case/control association tests (P-value ranges from 0.015 to 0.065), but they did not pass multiple testing thresholds (based on number of genes tested). For quantitative traits, we cannot estimate the power of our study, since previously published studies utilized population samples for quantitative trait association with BMI [18]. Nevertheless, in our data, it is interesting to see that MC4R and FTO are the two genes with the strongest effect sizes (risk allele odds ratio >1.2), probably explaining why they were the first two genes identified in GWAS for BMI [5], [6].

Discussion

In the current study, we performed a GWAS on obesity and obesity-related traits. The FTO gene reached genome-wide significance in this cohort with an odds ratio of 1.6. The MC4R gene is the second gene found by GWAS to be associated with BMI [6], and, while only marginally associated with obesity in our study, its odds ratio was 1.3. Given our modest sample size in GWAS, we estimate that the power to detect association (with perfect SNP tagging) at P<5×10−8 is 78.8% and 0.18% for FTO and MC4R, respectively. These odds ratio estimates in our data are higher than previous reports, for example, odds ratio for FTO is 1.3 in a study for early-onset obesity [5], for FTO and MC4R are 1.46 and 1.02 in a study for extreme obesity [14], or 1.25 and 1.26 in a study on morbidly obese adults with familial obesity [15], or 1.27 and 1.12 for obesity [10]. We note that the Hinney et al report investigated early onset extreme obesity and reported an odds ratio of 1.67 for FTO[16], comparable to our study. Therefore, the increased effect size could be due to the specific sample ascertainment scheme that we have used, that is, we sampled from the extreme tails of a quantitative trait distribution based on BMI. Even with the augmented sample of cases, family members and controls, no other SNPs reached genome wide significance. The results therefore strongly suggest that FTO and MC4R might be the only two major-effect genes for obesity with common variants in populations of European ancestry. Our study also represents an example where enrichment of extreme cases and controls can lead to increased odds ratio, and subsequently leads to improved power to detect associations.

The association between waist to hip ratio and the NRXN3 gene is of interest, as this is the third time the gene has been associated with body fat distribution [12], [13]. Neurexins are expressed in nervous tissue and are thought to be involved in cell adhesion during synapse formation [19]. Besides fat distribution, NRXN3 has been associated with several other traits, including addictions and schizophrenia [20]–[22]. Identifying the specific causal variant may be difficult because NRXN3 is an extremely large gene (~1.5 Mb) [19]. It is controlled by two promoters and has multiple transcripts. The SNPs associated with weight and fat distribution lie in different parts of the gene and will likely involve different transcripts with potentially different functions. The associated SNP in our study, rs11624704, appears to be about 85 kb upstream of the first exon, while those for the previous two studies, rs10150332 and rs10146997, appear to be about 8 kb apart near exon 11.

In conclusion, we have assayed a sample collection of obese cases, families and never-overweight controls, and performed association analysis on obesity and multiple quantitative phenotype measures. We obtained strong support for FTO as well as suggestive confirmation of several previously identified BMI-associated genes in obesity. Another outcome of our study is the identification of new candidate genes for obesity-related traits. Of particular interest is the association of NRXN3 with body fat distribution among extremely obese individuals.

Materials and Methods

Study participants

The current GWAS study includes 520 cases and 540 control subjects, who were non-Hispanic Caucasians. Cases were obese (BMI≥35 kg/m2) with a lifetime BMI>40 kg/m2. Among them, 32 were male while the rest were female subjects. Independent controls were selected who had a current and lifetime BMI≤25 kg/m2. The individuals in the samples were of approximately the same age but differed in average BMI by 29 kg/m2 (Table 1). After performing the GWAS, a combined sample of cases, controls and family members (N = 2,256), including all the study participants in the GWAS, were included for genotyping the top ~500 most significant SNPs based on genotyping budget. Subject characteristics of family members were shown in Table 2. Note that this is a study originally designed for investigating obesity genes in female subjects, but over time we have included a small fraction of males during the recruitment. All subjects gave written informed consent, and the protocol was approved by the Committee on Studies Involving Human Beings at the University of Pennsylvania.

Phenotype measures

Anthropomorphic phenotypes were directly measured in field settings. Percent fat was estimated using a bioelectric impedance (BIA) measure. The complete list of measures examined in this study is described in Table 1 and Figure S1. Body mass index was calculated from measured height and weight by the standard formula, Weight (kg) divided by Height (m2). Measurements were taken of subjects dressed in light clothing. Height was measured from a standing position using a stadiometer. Weight was measured by a scale with a maximum weight of 600 pounds (270 kg) (Tanita TBF310 Pro Body Composition Analyzer, Tanita, Arlington Heights, IL). Body composition was estimated by bioelectric impedance using the same Tanita scale. Waist circumference was measured while standing at the height of the iliac crest. Hip circumference was taken while standing at the maximum extension of the buttocks. Waist to hip ratio (WHR) was calculated by measured waist circumference divided by measured hip circumference. Age of Obesity Onset was the age at which the subject reported having first become overweight.

Genotyping

DNA was extracted from whole blood or lymphoblastoid cell lines using a high salt method. All cases and control subjects were genotyped on the Illumina HumanHap550 SNP arrays (Illumina, San Diego, CA) with ~550,000 SNP markers, at the Center for Applied Genomics, Children's Hospital of Philadelphia. Standard data normalization procedures and canonical genotype clustering files were used to process the genotyping signals and generate genotype calls. In addition, the combined sample of cases, controls and family members (N = 2,256, Table 2) were genotyped for the top 500 SNPs from the GWAS using the Illumina ISelect platform. All cases, family members, and controls were non-Hispanic Caucasians, and we further utilized multi-dimensional scaling to confirm the ethnicity status of cases and control subjects. A subset of the whole-genome genotype data were previously described in a CNV study on obesity [23].

Association analysis

The PLINK software version 1.07 was used to conduct association tests between SNP genotypes and specific phenotypes of interest. For traits that are approximately normally distributed, we utilized standard linear regression for assessing association but including age, sex and disease status as covariates. We attempted to exclude samples with genotyping rate less than 95% but none of the samples met this criterion. SNPs were excluded in analysis if the minor allele frequency was less than 1% (23298 SNPs were excluded), or if the Hardy-Weinberg Equilibrium P-value was less than 1×10−6 in control subjects (1366 SNPs were excluded), or if the genotype missing rate is higher than 5% (8190 SNPs were excluded). The study participants are of European ancestry as evaluated in previous studies [24]; given whole-genome data, we also performed multi-dimensional scaling analysis on SNPs not in LD (r2<0.2) with each other and confirmed that all cases and control subjects were of genetically inferred European ancestry (Figure S2). The QQ plot for the obesity GWAS is given in Figure S3, and the genomic control inflation factor was 1.05.

The combined dataset of cases, family members and controls was next analyzed using MQLS. MQLS utilizes a quasi-likelihood score test approach developed by Thornton and McPeek [25] that treats the data as a case-control analysis consisting of related and unrelated individuals. This combined approach has substantially more power than separate analyses using either case-control or family based methods. However, we acknowledge that since the candidate SNP genotyping study is not independent of the GWAS, the P-value distributions will be biased and therefore our study cannot be regarded as a standard “2-stage” analysis. The MQLS (b) statistic incorporates parental data in the estimation of case genotypes. We restricted these analyses to obesity status, since the method currently is adapted only for dichotomous phenotypes.

Supporting Information

The distribution of phenotype measures utilized in the current study. The age of onset information is available for cases only. BMI, weight, BIA and waist have bi-modal distribution, so we explored testing on cases only.

Multi-dimensional scaling (MDS) of the SNP genotyping data for samples with whole-genome genotypes, with (left panel) or without (right panel) 30 Asian, 30 African American and 30 Caucasians to seed the graph. A total of 70,593 SNPs not in LD (r2<0.2) and not in sex chromosomes were used in the MDS analysis. All GWAS samples were of genetically inferred European ancestry.