Background

A recently published genome-wide association study (GWAS) of late-onset Alzheimer's disease (LOAD) revealed genome-wide significant association of variants in or near MS4A4A, CD2AP, EPHA1 and CD33. Meta-analyses of this and a previously published GWAS revealed significant association at ABCA7 and MS4A, independent evidence for association of CD2AP, CD33 and EPHA1 and an opposing yet significant association of a variant near ARID5B. In this study, we genotyped five variants (in or near CD2AP, EPHA1, ARID5B, and CD33) in a large (2,634 LOAD, 4,201 controls), independent dataset comprising six case-control series from the USA and Europe. We performed meta-analyses of the association of these variants with LOAD and tested for association using logistic regression adjusted by age-at-diagnosis, gender, and APOE ε4 dosage.

Results

We found no significant evidence of series heterogeneity. Associations with LOAD were successfully replicated for EPHA1 (rs11767557; OR = 0.87, p = 5 × 10-4) and CD33 (rs3865444; OR = 0.92, p = 0.049), with odds ratios comparable to those previously reported. Although the two ARID5B variants (rs2588969 and rs494288) showed significant association with LOAD in meta-analysis of our dataset (p = 0.046 and 0.008, respectively), the associations did not survive adjustment for covariates (p = 0.30 and 0.11, respectively). We had insufficient evidence in our data to support the association of the CD2AP variant (rs9349407, p = 0.56).

Following the identification of the APOE ε4 allele as a risk factor for late-onset Alzheimer's disease (LOAD) in 1993 [1], consistent replication of subsequently identified candidates was not achieved until 2009, when two genome-wide association studies (GWAS) [2, 3] identified associations of variants in or near CLU, PICALM , and CR1 with LOAD, which were consistently replicated in multiple large, independent case-control studies [4–17]. Subsequently, a variant near BIN1 was reported [4] to achieve genome-wide significant association in a later GWAS published in 2010 that also replicated well in follow-up studies [14–19]. These results demonstrate the utility of the hypothesis-free GWAS approach for identifying loci that associate with LOAD and the necessity of pooling samples and data from multiple centers to obtain resources with sufficient statistical power (GWAS typically > 14,000, follow-up typically total > 28,000) to detect the modest ORs (e.g. 0.8/1.2) associated with these variants in GWAS and follow-up studies.

Two recently published companion studies by Hollingworth et al. [20] and Naj et al. [17] performed meta-analysis of two large GWAS datasets (n > 75,000). Besides APOE, CLU, PICALM, and CR1, the meta-analyses revealed association at ABCA7 (p = 5 × 10-21), MS4A6A (p = 1.2 × 10-16), MS4A4E (p = 1.1 × 10-10), EPHA1 (p = 6 × 10-10), CD2AP (p = 8.6 × 10-9) and CD33 (p = 1.6 × 10-9). In addition, the two datasets revealed opposing association (Naj et al. OR = 0.93, p = 0.001; Hollingworth et al. OR = 1.06, p = 0.03) of the variant near ARID5B (rs2588969) with LOAD, suggesting potential heterogeneity at this locus. In this study, we genotyped the variants identified at the CD2AP, EPHA1, and CD33 loci in our independent case-control dataset comprising six case-control series (n = 6,835). To assess the opposing associations at the ARID5B locus, we also genotyped the two ARID5B variants included in the Hollingworth et al. study. Genotypes from our follow-up case-control series (Mayo 2) for variants in ABCA7, MS4A6A and MA4A4E were included in Stage 3 of the Hollingworth et al. study, so we have not included these three variants in this study. We have performed meta-analyses of five variants (at CD2AP, EPHA1, ARID5B and CD33 loci) in our six case-control series, which showed no significant series heterogeneity. Furthermore, we have performed logistic regression analysis of our pooled series adjusting for covariates. Finally, we have used a Fisher's combined test to evaluate the significance of the association of these five variants in our data combined with the data in the Hollingworth et al. and Naj et al. studies.

We genotyped five variants (CD2AP; rs9349407, EPHA1; rs11767557, ARID5B; rs2588969 and rs4948288, CD33; rs3865444) in our independent follow-up case-control series (Mayo2) from three North American and three European Caucasian series. Detailed information about these samples is shown in Table 1 and genotype counts are shown in Table 2. Samples used in this study do not overlap with those included in the Naj et al. study and have not been included in any of the published LOAD GWAS. The Mayo2 dataset included in the Hollingworth et al. publication only included genotypes for ABCA7, MS4A6A and MA4A4E.

Table 1

Details of the Mayo2 samples used in this study and genotype counts

Number of samples

Mean Age (SD)

% Female

% ε4+

Series

AD

CON

Total

AD

CON

AD

CON

AD

CON

Jacksonville

507

967

1,474

80.0 (6.7)

81.7 (7.6)

61.9

56.3

60.2

21.8

Rochester

317

1,638

1,955

85.8 (4.5)

80.3 (5.2)

62.1

54.6

42.3

22.4

Autopsy

312

102

414

87.4 (4.8)

86.0 (4.3)

67.6

52.0

61.2

14.7

Norway

346

555

901

80.2 (7.3)

75.3 (6.8)

69.9

59.8

63.0

24.1

Poland

483

188

671

76.7 (4.8)

73.0 (5.9)

66.3

76.6

56.4

19.0

ARUK

669

751

1,420

75.6 (8.2)

76.2 (7.3)

55.6

49.9

58.0

24.4

The number of LOAD patients (AD) and controls (CON), mean age-at-diagnosis, percentage that are female and percentage that possess at least one copy of the APOE ε 4 allele are given for each individual series. Mean age is given as age at diagnosis/entry with the standard deviation (SD) from the mean in parentheses. None of the samples comprising the Jacksonville, Rochester and autopsy-confirmed Mayo Clinic or ARUK series (comprising Bristol, Leeds, Manchester, Nottingham, Oxford and Southampton), which were included in this follow-up study overlap with those used in the Naj et al. study and have not been included in any of the published LOAD GWAS. The Mayo2 dataset included in the Hollingworth et al. publication only included genotypes for ABCA7, MS4A6A and MA4A4E.

Meta-analyses of allelic association in the six Mayo2 series performed using a DerSimonian-Laird random effects model (Figure 1) revealed a significant pooled OR for the EPHA1 variant (Figure 1b; OR = 0.88, p = 0.008) comparable to that previously published by Naj et al. (OR = 0.87) and by Hollingworth et al. (OR = 0.90). As shown in Figure 1c and 1d, we also observed significant association for both ARID5B variants (rs2588969, OR = 1.08, p = 0.046; rs4948288, OR = 1.11, p = 0.008) with ORs comparable to those reported by Hollingworth et al. (OR = 1.06 and 1.07, respectively) and in the opposing direction to those reported by Naj et al. for rs2588969 (Stage 1+2 OR = 0.93, p = 7.7 × 10-4). As shown in Figure 1a and 1e, we did not observe significant association for CD2AP (OR = 0.98, p = 0.76) or CD33 (OR = 0.96, p = 0.32) in our meta-analyses. Breslow-Day tests provided no significant evidence that the ORs for any of these variants were heterogeneous among our series (all p > 0.25), as shown in Figure 1. The variant with the most heterogeneity was CD2AP (rs9349407) where the estimated percentage of variation due to heterogeneity across studies (I2) was 25.1% (95% CI 0%-70%) suggesting the presence of some heterogeneity for that variant.

Figure 1

Forest plots for meta-analysis ofCD2AP, EPHA1, ARID5B, andCD33variants in our six Mayo2 case-control series. ORs (boxes) and 95% CI (whiskers) are plotted for each population and shown on the right of each plot. Combined OR is the overall OR calculated by the meta-analysis using a random effects model. P-values are provided for the combined ORs and Breslow-Day tests of heterogeneity. I2 gives an estimate of between studies variance.

To adjust for important covariates, we included age-at-diagnosis/entry, sex and APOE ε 4 dosage in logistic regression analyses of all five variants in each of the six Mayo2 series; in our analysis of all Mayo2 series combined, series was included as an additional covariate. Table 3 shows the results for the six Mayo2 series combined (Mayo follow-up) as well as for each of the six individual Mayo2 series. For the purpose of comparison, we have also included in Table 3 the published GWAS results for the same variants. Adjustment for covariates revealed comparable ORs to those obtained in the meta-analyses, with improved p-values for the EPHA1 (OR = 0.87, p = 5 × 10-4), CD33 (OR = 0.92, p = 0.049) and CD2AP (OR = 0.97, p = 0.56) loci. However, the associations of the ARID5B variants were no longer significant following adjustment for covariates (rs2588969: OR = 1.05, p = 0.30, rs4948288: OR = 1.07, p = 0.11) suggesting that these associations may be dependent upon the series, age-at-diagnosis/entry, sex and/or APOE ε 4 dosage of the individual.

Table 3

Association of CD2AP, EPHA1, ARID5B, and CD33 variants with LOAD in the initial studies (ADGC and GERAD+) and Mayo2 follow-up series

aThe numbers shown for the series in the Naj et al. and Hollingworth et al. studies refer to the complete set analyzed. The numbers for the Mayo follow-up data refer to the number of samples successfully genotyped.

bMAFs were not reported for LOAD and control groups in the Naj et al. or Hollingworth et al. studies.

cThe results shown here for the Mayo2 follow-up dataset combined and for the subseries were obtained using logistic regression adjusted for age, sex and APOE ε 4 dosage. The Mayo2 follow-up dataset reported here is independent of that which was incorporated in the GWAS reported by Hollingworth et al. The results for each of the Mayo follow-up subseries (Jacksonville, Rochester, Autopsy-confirmed, Norway, Poland and ARUK) are listed immediately below the results for the Mayo2 follow-up dataset combined.

We report here successful replication of the association of two variants with LOAD in a large (n = 6,835), independent case-control study; rs11767557, which is located 3 kb upstream of EPHA1 (p = 5 × 10-4) and rs3865444, which is located 373 bp upstream of CD33 (p = 0.049). The ORs we observed in our meta-analyses (EPHA1 = 0.88, CD33 = 0.96) were comparable to those reported by both Naj et al. (EPHA1 = 0.87, CD33 = 0.89) and by Hollingworth et al. (EPHA1 = 0.90, CD33 = 0.89) such that the estimated p-values for association of these variants in all data (n > 42,000) were an impressive 2.1 × 10-15 for EPHA1 and 1.8 × 10-13 for CD33.

Although our meta-analyses showed successful replication of the association of the ARID5B variants rs2588969 (OR = 1.08, p = 0.046) and rs4948288 (OR = 1.11, p = 0.008) with a direction of association consistent with that reported by Hollingworth et al. (OR = 1.06 and 1.07, respectively), the associations did not survive adjustment for age-at-diagnosis/entry, sex and APOE ε 4 status (p = 0.30 and 0.11, respectively). This covariate-dependent association could explain the opposing association reported by Naj et al. in their discovery (OR = 0.88) and replication (OR = 1.05) datasets for rs2588969; the only ARID5B variant they tested. Therefore, while estimation of the p-values for association of the ARID5B variants in all datasets combined were highly significant (rs2588969; p = 2.3 × 10-9 and rs4948288; p = 4.0 × 10-4), interpretation of these associations should be treated with caution and should take into account the age-at-diagnosis/entry, sex and APOE ε 4 dosage of the populations. Finally, although the estimated p-value for association of rs9349407 (located in intron 1of CD2AP) in all datasets was 6.5 × 10-11, there was no evidence for association of this variant in our dataset alone (OR = 0.97, p = 0.56).

Our Mayo2 collection of case-control series studies provided a total of 2,634 LOAD and 4,201 controls. Combining across studies to perform global tests of significance for additive genotypic trend tests gave us 80% power to detect ORs ranging from 1.17 (or 0.855) for variants with a minor allele frequency (MAF) of 0.2 to 1.13 (or 0.883) for variants with a MAF of 0.45 in controls. The study provided approximately 60% power to detect the OR of 1.11 that we report for CD2AP (MAF = 0.27).

Case-control studies such as this are not designed to ascertain whether the variants with reported association with LOAD risk are the functional variant but they can identify a linkage disequilibrium (LD) block within which a truly functional variant may reside. Our results indicate that the EPHA1 and CD33 variants represent excellent candidates for targeted deep sequencing or high density genotyping in order to define the locus further, followed by subsequent functional studies of nearby genes to elucidate the mechanism behind these associations. With the exception of rs9349407, which lies within intron 1of CD2AP, all of these variants lie within intergenic regions but for ease of the reader, we have thus far only referred to the nearest gene for each variant. This by no means signifies that these variants (or the functional variants in LD with them) are assumed to affect the expression or function of the nearest gene but may affect other nearby genes. Until it is known which gene underlies these associations, all nearby genes should be included in follow-up functional investigation (all genes that reside within 100 kb of these variants are listed in Additional file 1, Table S1).

Taken along with our previous publications [5, 18, 20, 21], we have now performed follow-up association studies of 25 of the top GWAS-identified candidate LOAD genes and successfully replicated the association of eleven variants (in or near ABCA7, BIN1, CD33, CLU, CR1, EPHA1, GAB2, LOC651924, MS4A6A/4E and PICALM), eight of which are currently ranked in the top ten (after APOE) on AlzGene. This recent success in replicating genetic association highlights the utility of multiple, large case-control follow-up studies to confirm the novel associations reported by large GWAS, thus confirming them as good candidate genes for functional follow-up studies.

Ethics statement

Approval was obtained from the ethics committee or institutional review board of each institution responsible for the ascertainment and collection of samples. Written informed consent was obtained for all individuals that participated in this study.

Case-control subjects

The Mayo2 case-control series consisted of Caucasian subjects from the United States ascertained at the Mayo Clinic Jacksonville, Mayo Clinic Rochester, or through the Mayo Clinic Brain Bank. Additional Caucasian subjects from Europe were obtained from Norway [22], Poland [23], and from six research institutes in the United Kingdom that are part of the Alzheimer's Research UK (ARUK) Network. Although the ARUK samples used in this follow-up do not overlap with those employed in the original GWAS publication by Hollingworth et al., the same subject/sample ascertainment methodology was followed. The ARUK series included here are from Bristol, Leeds, Manchester, Nottingham, Oxford and Southampton. Since the Manchester cohort only consisted of LOAD cases, the Manchester cases were combined with subjects in the Nottingham series.

Genotyping

All genotyping was performed at the Mayo Clinic in Jacksonville using TaqMan® SNP Genotyping Assays in an ABI PRISM® 7900HT Sequence Detection System with 384-Well Block Module from Applied Biosystems, California, USA. The genotype data was analyzed using the SDS software version 2.2.3 (Applied Biosystems, California, USA).

Statistical Analyses

Meta-analysis of allelic association and Breslow-Day tests were performed using StatsDirect v2.5.8 software. Meta-analyses were performed using the results from each individual case-control series. Summary ORs and 95% CI were calculated using the DerSimonian and Laird (1986) random-effects model [24]. Breslow-Day tests were used to test for heterogeneity between populations. PLINK software [25] (http://pngu.mgh.harvard.edu/purcell/plink/) was used to perform logistic regression analysis under an additive model adjusting for age-at-diagnosis, sex and APOE ε 4 dose as covariates. In our analysis of all series combined, series was included as an additional covariate. Since genotype counts were not reported for series included in the Naj et al. or Hollingworth et al. studies, we employed a Fisher combined test to combine p-values across series. Power calculations, based on a Mantel-Haenszel chi-square test that pooled across six different study groups, were obtained to estimate the detectable odds ratios for an ordinal effect using a range of minor allele frequencies spanning those expected from the candidate variants.

Acknowledgements and Funding

We thank contributors, including the Alzheimer's disease centers who collected samples used in this study, as well as subjects and their families, whose help and participation made this work possible. We thank the members of the Alzheimer's Research UK (ARUK) consortium who contributed samples to the ARUK resource. This work was supported by grants from the US National Institutes of Health, NIA R01 AG18023 (NRG-R, SGY); Mayo Alzheimer's Disease Research Center, P50 AG16574 (RCP, DWD, NRG-R, SGY); Mayo Alzheimer's Disease Patient Registry, U01 AG06576 (RCP); and US National Institute on Aging, AG25711, AG17216, AG03949 (DWD). Samples from the National Cell Repository for Alzheimer's Disease (NCRAD), which receives government support under a cooperative agreement grant (U24AG21886) awarded by the National Institute on Aging (NIA), were used in this study. This project was also generously supported by the Robert and Clarice Smith Postdoctoral Fellowship (MMC); Robert and Clarice Smith and Abigail Van Buren Alzheimer's Disease Research Program (RCP, DWD, NRG-R, SGY) and by the Palumbo Professorship in Alzheimer's Disease Research (SGY). KM is funded by the Alzheimer's Research UK and the Big Lottery Fund. ZKW is partially supported by the NIH/NINDS 1RC2NS070276, NS057567, P50NS072187, Mayo Clinic Florida (MCF) Research Committee CR programs (MCF #90052018 and MCF #90052030), Dystonia Medical Research Foundation, and the gift from Carl Edward Bolch, Jr., and Susan Bass Bolch (MCF #90052031/PAU #90052). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Electronic supplementary material

13024_2011_206_MOESM1_ESM.DOCAdditional file 1: Table S1. Genes located within 100 kb of the five variants tested in this study. Chr, chromosome. Base pair positions (bp) are relative to the NCBI Human Genome build 36.1. The position of the variant relative to the gene is given as 5' (upstream from the gene's transcription start site) or 3' (downstream from the gene's last exon). Distance indicates the number of base pairs from the variant position to the gene's nearest exon. (DOC 50 KB)

Below are the links to the authors’ original submitted files for images.

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.