Abstract

Type 1 diabetes is a common, multifactorial disease with strong familial clustering (genetic risk ratio [λS] ∼ 15). Approximately 40% of the familial aggregation of type 1 diabetes can be attributed to allelic variation of HLA loci
in the major histocompatibility complex on chromosome 6p21 (locus-specific λS ∼ 3). Three other disease susceptibility loci have been clearly demonstrated based on their direct effect on risk, INS (chromosome 11p15, allelic odds ratio [OR] ∼ 1.9), CTLA4 (chromosome 2q33, allelic OR ∼ 1.2), and PTPN22 (chromosome 1p13, allelic OR ∼ 1.7). However, a large proportion of type 1 diabetes clustering remains unexplained. We report
here on a combined linkage analysis of four datasets, three previously published genome scans, and one new genome scan of
254 families, which were consolidated through an international consortium for type 1 diabetes genetic studies (www.t1dgc.org) and provided a total sample of 1,435 families with 1,636 affected sibpairs. In addition to the HLA region (nominal P = 2.0 × 10−52), nine non–HLA-linked regions showed some evidence of linkage to type 1 diabetes (nominal P < 0.01), including three at (or near) genome-wide significance (P < 0.05): 2q31-q33, 10p14-q11, and 16q22-q24. In addition, after taking into account the linkage at the 6p21 (HLA) region,
there was evidence supporting linkage for the 6q21 region (empiric P < 10−4). More than 80% of the genome could be excluded as harboring type 1 diabetes susceptibility genes of modest effect (λS ≥ 1.3) that could be detected by linkage. This study represents one of the largest linkage studies ever performed for any
common disease. The results demonstrate some consistency emerging for the existence of susceptibility loci on chromosomes
2q31-q33, 6q21, 10p14-q11, and 16q22-q24 but diminished support for some previously reported locations.

Type 1 diabetes is the third most prevalent chronic disease of childhood, affecting up to 0.4% of children in some populations
by age 30 years, with an overall lifetime risk of nearly 1% (1,2). It is believed that a large proportion of cases of type 1 diabetes result from the autoimmune destruction of the pancreatic
β cells, leading to complete dependence on exogenous insulin to regulate blood glucose levels (3). The etiology of type 1 diabetes is only partially characterized, but it is recognized that both genetic and environmental
determinants are important in defining disease risk. Type 1 diabetes clusters in families, based on population-based twin
and family studies (4) but does not segregate with a known mode of inheritance (5). The incidence and the age at onset of type 1 diabetes in some populations have changed dramatically since 1950 (6–8). These data, coupled with the incomplete concordance for the phenotype in monozygotic twins (30–70%), suggest that the penetrance
of type 1 diabetes susceptibility alleles is strongly influenced by environmental factors (4).

Type 1 diabetes is strongly clustered in families with an overall genetic risk ratio (λS) of ∼15 (9). At least one locus that contributes strongly to this familial clustering resides within the major histocompatibility complex
(MHC) on chromosome 6p21. Genetic, functional, structural, and model studies all suggest that the HLA class II genes (HLA-DRB1 and -DQB1) likely represent the primary determinants of IDDM1. The frequency of HLA class II susceptibility alleles also correlates well with the population incidence of type 1 diabetes
(10). These studies suggest that the MHC (IDDM1) may account for nearly 40% of the observed familial clustering of type 1 diabetes, with a locus-specific λS of ∼3 (11).

Given that HLA alone cannot explain the familial clustering of type 1 diabetes, several, perhaps many, genes remain to be
identified. First, in the general population, individuals who carry the high-risk haplotypic combination DRB1*04-DQB1*0302/DRB1*03-DQB1*0201
have ∼5% absolute risk of type 1 diabetes. However, within affected sibpair families, this genotype has ∼20% risk (5,12). Second, three non-HLA loci have been identified based on genetic association studies: IDDM2 [INS, 11p15 (13–16)], IDDM12 [CTLA4, 2q33 (17)], and LYP/PTPN22 [1p13 (18–20)]. Finally, the observed risk of type 1 diabetes in first- and second-degree relatives declines in a pattern consistent with
multiplicative effects of multiple loci (11).

The first type 1 diabetes genome-wide scans for linkage, using fewer than 100 affected sibpair families, identified chromosome
6p21 (IDDM1) as the major type 1 diabetes risk locus (21,22). Subsequent studies supported non-HLA loci on chromosomes 11q13 (IDDM4) and 6q25 (IDDM5) in families from the U.K., the U.S., and France and on chromosome 15q26 (IDDM3) in families from Canada. Using both linkage and association approaches, other putative type 1 diabetes susceptibility loci
were identified on chromosomes 18q12-q21 (IDDM6), 2q33 (IDDM7), 6q27 (IDDM8), 3q22-q25 (IDDM9), 10p11-q11 (IDDM10), 14q24-q31 (IDDM11), 2q31-q33 (IDDM12), 2q34-q35 (IDDM13), 6q21 (IDDM15), 14q32 (IDDM16), 10q25 (IDDM17), 5q33 (IDDM18), 7p15-p13 (GCK), 1q42, 16q22-q24, Xp11 (conditional on HLA-DR genotype), and 8q22-q24 (8–10,12–17). Even though statistical evidence supporting linkage for some of these regions was strong in the initial reports, most regions
have not been clearly established in multiple populations (9,23).

A major barrier to type 1 diabetes gene identification, given the likely small locus-specific contribution (low λS) for non-HLA genes, is the relatively small number of newly ascertained affected sibpair families with type 1 diabetes. In
addition, previous studies have used available samples from a variety of collections, making the compilation of linkage results
difficult because of apparent overlap in families analyzed and uncertainty in the equivalence of allele coding. To facilitate
the genetic analysis of type 1 diabetes, results from two previous genome scans of type 1 diabetes were merged, comprising
767 families and 831 affected sibpairs from the U.K. and U.S. (24). The combined analyses supported linkage to at least six non-HLA regions to type 1 diabetes, including IDDM2 (INS, nominal P = 6.5 × 10−4), 2q31-q33 (P = 5.1 × 10−4), and 10p11 (P = 3.2 × 10−4). A third genome scan of 424 type 1 diabetic families with at least two affected relative pairs (464 affected pairs) from
Scandinavia (25) found no evidence for linkage at these latter three loci but did support linkage on chromosomes 5q11.2 (nominal P = 8.1 × 10−4) and 16p13 (P = 1.6 × 10−4). The IDDM15 region on chromosome 6q21 supported linkage (P = 7.0 × 10−7) when HLA was taken into consideration in a combined analysis of U.S., French, and Scandinavian families.

We present here a joint analysis of data from these three prior genome-wide scans (U.S., U.K., and Scandinavia) as well as
254 new families collected for this study, a total of 1,435 multiplex families, for linkage to type 1 diabetes. With an average
map information content of 67% (from ∼400 polymorphic microsatellite markers in each scan), this family collection provides
∼95% power to detect a locus with locus-specific λS ≥ 1.3 and P = 10−4. In addition to HLA, there was nominal evidence for linkage of type 1 diabetes to 10 other chromosome regions, including
6q21 (IDDM15) and 3 that reached genome-wide levels of significance, 2q31-q33 (IDDM12 and IDDM7), 10p11-q14 (IDDM10), and 16p12-q24. These data support the existence of non-HLA susceptibility loci for type 1 diabetes and strengthen support
for a subset of loci previously proposed to contribute to type 1 diabetes risk.

RESEARCH DESIGN AND METHODS

Four sets of Caucasian families provided genome scan data for the combined analyses. Three sets of families have been previously
published—U.K. (21,26), U.S. (24,26), and Scandinavia (25)—and one set of 254 families that were newly assembled for this study. The new collection of DNA samples from 254 families
was obtained from several sources. DNA samples from 47 U.K. families were identified from the Diabetes U.K. Warren repository
(28) that had not been genotyped previously. Families not previously used in published genome scans from the U.S. were contributed
by the Joslin Diabetes Center (121 families) and from the Human Biological Data Interchange (76 families). Ten families from
Australia were collected by investigators at the Walter and Eliza Hall Institute as previously described (29). In total, there were 1,435 families containing 6,899 individuals (6,358 with genome scan data). A total of 3,109 individuals
were affected (with type 1 diabetes), and of these, 3,072 had genotype data (for details of the samples, see table in online
appendix [available at http://diabetes.diabetesjournals.org]).

Genotyping.

Microsatellite marker genotyping technologies and allele scoring conventions varied between the different laboratories providing
the data for previously published type 1 diabetic families. Details of genotyping of the U.K. (21), U.S. (27), and Scandinavian (25) families have been previously described. The 254 new families were genotyped by the Center for the Inheritance of Disease
Research (http://www.cidr.jhmi.edu/) using a panel of 405 microsatellite markers. Because direct merging of genotypes (by standardized allele size) was not possible,
within-family recoded genotype data from all four sources were merged into a single database. Family-naming conventions between
the samples were normalized, and individual marker names were modified to indicate the laboratory of origin for the genotyping.
After elimination of marker inconsistencies (see below), genetic markers were selected to form the analysis panel. Multiple
independent genotyping occurred for some markers on a subset of individuals by different laboratories. In these cases, only
one marker was used for the current analysis. Unless a marker showed inconsistencies in identity-by-descent (IBD) sharing,
the marker that was included for analysis was the one scored in the most samples. A total of 1,190 markers were included in
the combined analyses. Markers that previously had been added to maps because of association with type 1 diabetes were excluded
to avoid bias in the multipoint linkage results. The excluded markers were located on chromosome 2 (alpha4, ND1, D2S152, CTLA4, and IGFBP5) and chromosome 11 (INS, TH).

Statistical analysis.

Before integration of the genetic data, marker error detection and pedigree structure within each dataset were made using
PREST software (30). This method uses the genome scan data to determine the likelihood of each specified relationship given the genetic data.
Unlikely marker genotypes were resolved by recoding the specific genotype to “unknown.” Occurrences of nonpaternity were resolved
by changing the pedigree structure to that which was most likely and then repeating the analysis to confirm appropriate relationships.
An integrated marker map was developed by using public databases (Mammalian Genotyping Service, http://research.marshfieldclinic.org/genetics/Genotyping_Service/mgsver2.htm; Southampton, http://cedar.genetics.soton.ac.uk/public_html/; Cooperative Human Linkage Center, http://gai.nci.nih.gov/CHLC/; deCODE, http://www.decode.is) as well as physical map and genome sequence information from the University of California at Santa Cruz (http://genome.ucsc.edu/) using primer sequences in BLAST searches against the genome sequence. All analyses were based on this “consensus” map. Single
and multipoint linkage analyses (based on the consensus map order and distances) were performed using GeneHunter-plus [S(pairs) option (31–33)]. Examination of double recombinants was performed using Merlin software (34). Information content was estimated using Allegro (35).

Estimation of IBD statistics and resulting likelihoods under the null and alternative hypotheses were computed within each
dataset. These dataset-specific likelihoods were then combined for the combined linkage analyses. Multipoint linkage analyses
were performed, and maximized logarithm of odds (LOD) scores were calculated under an exponential model with δ constrained
between 0 and 2 (32). Genome-wide empirical P values were determined by simulating Mendelian transmission with families maintaining the patterns of missing data observed
in the sample (36). The fraction of LOD scores observed greater than the nominal value, based on simulations of 10,000 replicates across the
genome, provided the estimated genome-wide P value. Exclusion mapping was performed using the MapMaker/SIBS program (37).

Previous studies have identified a potential type 1 diabetes susceptibility locus near (but not within) the HLA complex (IDDM15) (25,27,38). In the presence of strong support for linkage due to IDDM1, determining the support for IDDM15 is complex because of the positive correlation among the IBD proportions for linked loci. Without accounting for this positive
correlation, statistical tests are biased toward inferring an epistatic relationship. A simple approach when IBD is known
is to compare the observed correlation, r, between the IBD estimates at two loci (i.e., IDDM1 and IDDM15) with the theoretical correlation, ρ = (1–2θ)2, for two loci separated by recombination fraction, θ. The statistic t = (z − ζ)√n− 3, where z =
ln () and ζ = ln () allows for the test of “interaction” that contrasts the observed IBD at IDDM15 based on the distance of IDDM15 from IDDM1 and the expected IBD given that distance. Using a sex-averaged map, IBD estimates for each sibpair in the data were computed,
and a single IBD estimate was selected from each pedigree. Thus, each of these IBD estimates is independent. Under the null
hypothesis of no interaction between these loci, t approximately has a standard normal distribution. Observed correlations greater than ρ reflect increased sharing at IDDM15 over that expected from IBD at IDDM1 and the hypothesized genetic distance of IDDM15 from IDDM1. A series of 10,000 simulations were performed as described above (36), with the correlation in IBD computed between IDDM1 and IDDM15. The empirical P value for significance of IDDM15 was based on the number of simulated correlations greater than that observed in the original data.

Exclusion mapping.

Each of the four datasets had the equivalent of an ∼9-centiMorgan (cM) map, providing an average marker information content
of ∼ 66%. The range of information from the markers was 62% for chromosome 15 to 73% for chromosome 16. In the combined data,
82% of the genome could be excluded at LOD < −2 for loci of effect size λS ≥ 1.3 (supplementary figures, lower curves), and >95% could be excluded for λS ≥ 1.5. Several entire chromosomes could be excluded (chromosomes 7, 8, 18, 20, 21, and 22). The majority of chromosome 6
(due to the strongly linked 6p21/MHC region) and <50% of chromosomes 16, 19, and X could be excluded for λS ≥ 1.3. For effects of λS ≥ 1.1, only 6% of the genome could be excluded. The extent of exclusion for each chromosome is shown in Table 1.

IDDM15.

The t-statistic was computed to determine the increase in sharing at IDDM15, beyond that expected from sharing at IDDM1 and expected decay in sharing due to genetic distance. In the current data, the test statistic was computed for IDDM15 at a position 37 cM from IDDM1 (at 47 cM on chromosome 6), using a sex-averaged map. The test statistic was also computed for a range of map positions within
5 cM (32–42 cM) with similar results. At the 37-cM distance from IDDM1 and IDDM15, the expected correlation coefficient between the IBD estimates under the null hypothesis was estimated as ρ = 0.2276. The
observed Pearson’s correlation coefficient between the IBD estimates at IDDM1 and IDDM15, using 1,401 informative pedigrees, was r = 0.3132 (empirical P < 1.0 × 10−4). These results support an HLA-independent effect in the IDDM15 region.

DISCUSSION

In a previous combined analysis of U.K. and U.S. families (24), it was concluded that an effort to merge and jointly analyze existing families would be required to clarify the role of
non–HLA-linked loci in type 1 diabetes. In the present study, this effort has been achieved under the auspices of the Type
1 Diabetes Genetics Consortium (http://www.t1dgc.org). We have assembled families and merged data from three large genome scans and added new data from 254 families not previously
scanned. This increased sample size has allowed the exclusion of >80% of the human genome for locus-specific, but population-independent,
effects of λS ≥ 1.3. In addition to continued support for type 1 diabetes susceptibility related to the MHC (IDDM1) and INS (IDDM2), we identified eight regions that supported non–HLA-linked susceptibility. Furthermore, we identified three chromosomes
that contained extensive areas for which linkage could not be excluded—chromosomes 16, 19, and X—that could benefit from further
genotyping to increase information content and the analysis of additional families.

Three non–HLA-linked regions provided support for linkage at, or near, the genome-wide level of significance (P < 0.05): chromosome 2q31-q33 (IDDM7 and IDDM12), 10p14-q11 (IDDM10), and 16q22-q24. These locations were unlikely to have occurred by chance, based on our simulations. Together with the six
other non–HLA-linked regions that exhibited evidence of linkage at nominal P < 0.01 (Table 2) and the 10-cM map, the data indicate a strong non-HLA genetic effect for type 1 diabetes (39). Furthermore, none of the three most strongly linked regions exhibited support for linkage in the 408 families from Scandinavia
(25).

Strong support for linkage (nominal P = 7.0 × 10−7), after taking into account linkage to HLA) to the IDDM15 locus (6q21), was observed previously in a combined analysis of French, U.S., and Scandinavian families (25,38). In the present study, we have obtained support for IDDM15 (empirical P < 1.0 × 10−4). The ability to further define the effects of this locus will be facilitated by increased information content in the HLA
region and in the region surrounding IDDM15 to better estimate the observed IBD sharing and model the residual linkage to the HLA region, including taking into account
sex-specific genetic map differences.

The IDDM12 locus lies within the 2q31-q33 region and has been attributed to single nucleotide polymorphisms (SNPs) in the 3′-untranslated
region of CTLA4 (15); however, the modest λS value predicted from the odds ratios (ORs) (1.1–1.2) of the disease-associated SNPs at CTLA4 (λS ∼1.01) in type 1 diabetes seems unlikely to fully account for the magnitude of the observed evidence for linkage (regional
λS ∼1.19). This result suggests the presence of other loci in the 2q31-q33 region, if this linkage is confirmed in other future
studies. Originally, IDDM7 at chromosome 2q33 was assigned on the basis of evidence of allelic association of the D2S152 microsatellite marker, but this association has not been substantiated. The evidence supporting linkage in the current study
does not include the putative IDDM13 locus at chromosome 2q34-q35 (29). Linkage of type 1 diabetes to 10p14-q13 (IDDM10) is well supported by the current and past studies (24,26); however, there has been little follow-up other than association analyses of the functional candidate gene GAD2, suggesting that this gene is not a type 1 diabetes susceptibility locus (40,41). The observed locus-specific effects for 2q31-q33 (λS ∼1.19) and 10p14-q11 (λS ∼1.12) suggest that a single common susceptibility allele would have an allelic association (OR) ∼3. This effect should be
identifiable in a fine-mapping association study using dense SNP maps across the regions.

Support for a type 1 diabetes susceptibility locus on chromosome 16p12-q11.1, which was observed independently in both the
combined U.K. and U.S. families (nominal P = 4.5 × 10−3) and in the Scandinavian families (P = 2 × 10−4), remained in the present study (P = 3.3 × 10−3). A recent analysis of four rheumatoid arthritis genome scans (42) reported evidence for linkage at chromosomes 6p21 (HLA; P = 2 × 10−5) and 16p-cen (P = 0.004). Because rheumatoid arthritis, antithyroid autoimmune disease, and type 1 diabetes cluster in families more often
than expected by chance (43), evidence for linkage for any one of these autoimmune diseases could be informative for others. Evidence for linkage in
U.K. families with early-onset rheumatoid arthritis (44) to chromosome 16p has previously been demonstrated (P = 3.2 × 10−4). The comparison between linkage scan results for type 1 diabetes and rheumatoid arthritis provides other interesting similarities.
The largest, single, combined scan of rheumatoid arthritis families (45) reported significant linkage of rheumatoid arthritis to chromosome 6p21 (HLA; P = 5 × 10−12), and some evidence (P < 0.005) of rheumatoid arthritis linked to six other regions (1q43, 6q21, 10q21, 12q12, 17p13, and 18q21). This overlap with
potential type 1 diabetes susceptibility at 6q21, 12q12, and 16p-cen may not be coincidental in the etiology of these autoimmune
diseases.

Recently, evidence for association of type 1 diabetes with alleles in the PTPN22 locus (chromosome 1p13) has been reported (18), and this association has been confirmed (19,20). PTPN22 encodes a lymphoid-specific tyrosine phosphatase (LYP) and is also associated with autoimmune thyroid disease, rheumatoid
arthritis, and SLE (46). The absence of evidence supporting linkage of type 1 diabetes to chromosome 1p13 (D1S206) in the 1,435 families studied here is not surprising given the magnitude of the PTPN22 association with type 1 diabetes. The OR of PTPN22 is large (∼1.7) but the λS is ∼1.05. Thus, to detect linkage at P < 0.001 with 50% power, a sample of >8,000 affected sibpair families would be required, using a fully informative genetic
map. Assuming a multiplicative model, the contribution of PTPN22 to type 1 diabetes (based on the observed OR) is ∼2%, much lower than HLA (40–50%). Nevertheless, the knowledge that PTPN22 may be involved in risk to type 1 diabetes, as well as in other autoimmune diseases, is significant and could provide insight
into modulating T-cell activity for disease prevention.

Several previously supported regions of linkage have diminished support in the current analyses. Type 1 diabetes susceptibility
locus on chromosome 1q42 was strongly supported (nominal P = 9.8 × 10−5) in a study of 679 U.K. and U.S. families (24,47) but exhibited decreasing linkage support in a follow-up analysis of 616 families using a denser map in the region (P = 4.0 × 10−4). The evaluation of previously supported regions is difficult, even in the present study with over 1,600 affected sibpairs,
particularly for regions with low λs (e.g., λs ∼1.1). For example, the region on 1q42 was originally supported with MLS = 3.31 and λs ∼1.5 (27). Although the current sample excludes this region at the reported λs ∼1.5, support for linkage in this region has decreased with increasing sample size from a LOD = 2.20 (24), to the current LOD = 0.87 (nominal P = 1.4 × 10−2) with λs ∼1.05. At the current estimated magnitude of genetic effect, the 1q42 region could not be excluded.

In a combined analysis of animal model and human linkage data from a number of autoimmune diseases, chromosome 18q12-q21 demonstrated
evidence of linkage, which has now been supported by the analysis of congenic strains in NOD mice (48). There was no support for loci on chromosome 18 at P < 0.05 in the current study. Three independent studies of type 1 diabetes have reported linkage to chromosome 8q (49–51), but there was no support for 8q in this study. Additional previously reported loci with relatively little support for linkage
in the current study include IDDM4 (11q13), IDDM6 (18q12-q21), IDDM9 (3q22-q25), IDDM11 (14q24-q31), IDDM16 (14q32), IDDM17 (10q25), and IDDM18 (5q33). These data suggest that these putative type 1 diabetes susceptibility loci represent either false-positive results
or have very small effects that may be more readily detected in certain populations because of variation in allele frequencies
or other factors, including the possibility of population-specific genetic or environmental effects.

Linkage and fine mapping studies in mouse models of SLE (52) and of type 1 diabetes (53) have demonstrated that a single linkage peak may be composed of several susceptibility loci. In human populations, a linkage
signal may be observed by the chance clustering of several disease loci, each with relatively weak locus-specific effects.
The presence of multiple susceptibility loci may also account, in part, for broad linkage peaks often observed in studies
of complex, common diseases. This underlying complexity would also increase the difficulty to obtain convincing results in
future fine-mapping association studies. Both development of novel analytical approaches and increased sample size will be
necessary to resolve this apparent complexity (54,55). Through the efforts of consortia (such as our international effort), it will be possible to increase the number of families
for type 1 diabetes (http://www.t1dgc.org), which would increase power and allow exclusion of loci λs ≥ 1.2, as well as provide standardized samples and reagents for future fine-mapping studies.

These results suggest two parallel tracks for the identification of type 1 diabetes susceptibility loci. First, systematic
fine mapping of all the variants responsible for the HLA linkage to type 1 diabetes is justified, especially within the 4-Mb
HLA region. Second, further exploration of potential non-HLA regions described here is now justified, including chromosomes
2q31-q33, 6q21, 10p14-q11, and 16q22-q24. Because sample sizes in linkage and association studies have historically been small
and few genes (out of ∼25,000) have been studied in depth, future collaborative efforts and establishment of accessible resources
for study should increase the yield of true disease susceptibility loci.

Acknowledgments

P.C. has received support from the National Institute of Diabetes and Digestive and Kidney Diseases and Juvenile Diabetes
Research Foundation. S.S.R. has received support from the National Institute of Diabetes and Digestive and Kidney Diseases
and the JDRF. J.A.T. has received support from the JDRF and the Wellcome Trust.