Abstract

Background: No GWAS on the risk of cutaneous squamous cell carcinoma (SCC) has been published. We conducted a multistage genome-wide association study (GWAS) to identify novel genetic loci for SCC.

Methods: The study included 745 SCC cases and 12,805 controls of European descent in the discovery stage and 531 SCC cases and 551 controls of European ancestry in the replication stage. We selected 64 independent loci that showed the most significant associations with SCC in the discovery stage (linkage disequilibrium r2 < 0.4) for replication.

Introduction

A number of genome-wide association studies (GWAS) have identified dozens of common genetic variants associated with human pigmentation traits and skin cancers (1, 2). However, the published GWAS on skin cancers exclusively focused on melanoma and basal cell carcinoma (BCC), while no GWAS on squamous cell carcinoma (SCC) has been published to date. The candidate gene–based studies have identified several genes associated with the risk of SCC, including genes regulating pigmentation traits, such as melanocortin 1 receptor (MC1R; ref.3–5), IFN regulatory factor 4 (IRF4; ref.6), and tyrosinase (TYR; ref. 7), genes irrelevant to pigmentation traits, such as X-ray repair cross-complementing protein 1 (XRCC1; refs.8, 9), cytotoxic lymphocyte-associated antigen-4 (CTLA-4; ref.10), as well as ubiquitin-associated domain-containing protein 2 (UBAC2) and exocyst complex component 2 (EXOC2), which were identified in a candidate genetic study of SCC based on the GWAS significant loci of BCC (11). Aiming to identify novel genetic loci for SCC risk, we conducted a multi-stage GWAS on SCC in the populations of European descent.

Materials and Methods

Study population

SCC GWAS.

In the discovery stage, we combined three component studies: the Harvard studies, the Rotterdam Study (RS)-I and the RS-II.

In the Harvard GWAS, we combined data from four case–control studies nested within the Nurses' Health Study (NHS) and the Health Professionals Follow-up Study (HPFS): a type II diabetes case–control study nested within the NHS (T2D_NHS), a type II diabetes case–control study nested within the HPFS (T2D_HPFS), a coronary heart disease case–control study nested within the NHS (CHD_NHS), and a coronary heart disease case–control study nested within the HPFS (CHD_HPFS). Overlapped samples among the component datasets were excluded. Detailed descriptions of the study population were presented in the Supplementary Materials and Methods and were published previously (12, 13).

To define SCC in the Harvard GWAS, the participants in the NHS and HPFS reported newly diagnosed cancers biennially. With their permission, medical records were obtained and reviewed by physicians to confirm their self-reported diagnosis. We included the pathologically confirmed invasive SCC cases with no previously diagnosed cancer as the eligible cases in the SCC GWAS. The controls were defined as those who did not report any type of major cancers. The study protocol was approved by the Institutional Review Board of Brigham and Women's Hospital and the Harvard School of Public Health (Boston, MA).

The Rotterdam study.

The Rotterdam Study (RS) is a prospective population-based follow-up study of the determinants and prognosis of chronic diseases in the elderly including skin diseases and cancer in participants living in Rotterdam, the Netherlands. The design has been prescribed in detail (14). In brief, the RS consists of a major cohort (RS-I) and two extensions (RS-II and RS-III). The cohort is predominantly (95%) Caucasian and the overall response rate for all three cycles at baseline was 72.0%. The Medical Ethics Committee of the Erasmus Medical Center and the review board of the Dutch Ministry of Health, Welfare and Sports have ratified the RS. From each participant, written informed consent was obtained. For this study, we included participants from the RS-I and RS-II cohorts, as the number of SCC cases in RS-III was very small.

To ascertain histologically confirmed SCC cases, all participants from the RS were linked with the nationwide network and registry of histo- and cytopathology in the Netherlands (PALGA; up to 23rd September 2011; ref.15). PALGA was founded in 1971 and achieved complete national coverage in 1991. Every obtained pathology excerpt contains encrypted patient data, a report identifier, a conclusion of the pathologist (often differentiating between biopsy and excision and stating the localization of the SCC), and a PALGA diagnosis line derived from Systematized Nomenclature of Medicine (15). We used a previously published approach to identify unique SCC cases based on date of diagnosis, biopsy/excision and/or tumor localization (16). If the diagnosis or the number of unique SCC remained unsure, the medical files were hand searched and cases that remained dubious were excluded.

Replication study.

In the replication stage, a fast-track replication was conducted among 531 SCC cases and 551 healthy controls in the skin cancer case–control study nested within NHS and HPFS (skin cancer study). All the cases and controls in this study were from the subcohorts of NHS and HPFS who had given a blood specimen. Eligible cases consisted of pathologically confirmed invasive SCC cases diagnosed after the baseline up to 2006 follow-up cycle for both cohorts, who had no previously diagnosed cancer. Controls were randomly selected from participants who were free of diagnosed cancer up to and including the questionnaire cycle in which the case was diagnosed. One or two controls were matched to each case by age (±1 year). Cases and their matched controls were selected in the same cohort.

Information on pigmentation traits were collected from prospective questionnaires in both NHS and HPFS using similar wording. We used categorical variables to indicate the natural hair color (red, blonde, light brown, dark brown, and black), tanning ability (practically none, light tan, average tan and deep tan in NHS; painful burn and peel, burn then tan, tan without burn in HPFS), and the total number of lifetime severe sunburns (none, 1–2, 3–5, 6, and more).

Genotyping, imputation, and quality control

SCC GWAS.

We previously performed genotyping in the component sets of Harvard GWAS using the Affymetrix 6.0 array (12, 13). We used the MACH program (17) to impute 2,543,887 autosomal SNPs based on haplotypes from the HapMap (18) database phase II data build 35 (CEU) in all of the four component sets, including T2D_NHS, T2D_HPFS, CHD_NHS, and CHD_HPFS (19). Samples from the four studies were imputed separately. We observed high imputation quality in each cohort's imputation. SNPs with imputation r2 > 0.95 and minor allele frequency (MAF) > 0.01 in each study were included in meta-analysis. Finally, a total of 1,777,244 SNPs were included in the Harvard GWAS.

Genotyping from participants of the RS has been described before (14). In brief, DNA from whole blood was extracted following standard protocols. The Infinium II HumanHap550K Genotyping BeadChip version 3 was used to genotype both RS-I and RS-II cohorts. Next, the RS-I and RS-II cohorts were imputed separately using the HapMap Phase II CEU reference panel (Build 36) as the reference panel and using a two-step procedure imputation algorithm implemented in the program MACH. SNPs with imputation r2 < 0.03, MAF < 0.02 for RS-I, and MAF < 0.08 for RS-II were excluded. After quality control, a total of 2,356,032 SNPs from RS-I and 1,956,891 SNPs from RS-II were available for GWAS and meta-analysis.

Replication study.

We selected 64 independent SNPs [linkage disequilibrium (LD) r2 < 0.4] showing the strongest associations with SCC in the discovery stage for replication in the skin cancer study. We genotyped these SNPs using TaqMan OpenArray system at the Dana Farber/Harvard Cancer Center Polymorphism Detection Core. Laboratory personnel were blinded to the case–control status, and blinded quality control samples were inserted to validate genotyping procedures; concordance for the blinded samples was 100%. Primers, probes, and conditions for genotyping assays are available upon request. We excluded five SNPs (rs4980694, rs12210050, rs13156707, rs11263585, and rs3099065) that failed genotyping in the replication set. The rest 59 SNPs were successfully genotyped with call rate >85% and Hardy–Weinberg P value > 0.01. Detailed information of these SNP was presented in Supplementary Table S1.

Seven common MC1R variants (Val60Leu, Val92Met, Arg151Cys, Ile155Thr, Arg160Trp, Arg163Gln, and Asp294His) were previously genotyped among a subgroup of the skin cancer case–control study (257 SCC cases and 282 controls). Detailed descriptions of this subgroup study were previously published (5).

Statistical analysis

We used logistic regression to test associations between minor allele counts and SCC risk in the discovery and replication sets. In the discovery stage, we used the imputed genotype data based on HapMap phase II data build 35 (CEU) for the analyses. We analyzed the three component GWASs in the discovery set separately. We adjusted for age, gender, and the three largest principal components of genetic variation of each GWAS in the regression model. These principal components were calculated for all individuals on the basis of ca. 10,000 unlinked markers using the EIGENSTRAT software (20). We additionally adjusted for the component study set in the Harvard GWAS. We used PLINK for the GWAS analyses in this study. Associations in each GWAS were combined in an inverse variance weighted meta-analysis using METAL (21). Age and gender were adjusted in the skin cancer replication set. The same meta-analysis was conducted to combine the discovery set and the replication set.

Results

We combined three component studies in the discovery stage of this SCC GWAS, which included the Harvard GWAS, the RS-I, and the RS-II. We used the skin cancer case–control study nested in the Harvard cohorts in the replication stage. The details of each component study of this multi-stage SCC GWAS were summarized in Table 1. In the discovery stage, we analyzed a total of 2,392,512 autosomal SNPs imputed from HapMap phase II data build 35 (CEU) after quality control for their associations with SCC. The quantile-quantile (Q–Q) plot of the SCC GWAS did not demonstrate a systematic deviation from the expected distribution (Supplementary Fig. S1). The overall genomic control inflation factor (λGC) was 1.00. The Manhattan plot of this SCC GWAS was presented in Supplementary Fig. S2. We selected 64 independent SNPs (LD r2 < 0.4) that showed the strongest associations (by P values) with SCC in the discovery stage for replication. Detailed information of these selected SNP was presented in Supplementary Table S1. Five SNPs (rs4980694, rs12210050, rs13156707, rs11263585, and rs3099065) that failed genotyping in the replication set were excluded.

After combining the discovery set and the replication set, we identified the SNP rs8063761 on chromosome 16 most significantly associated with SCC risk (P = 1.7 × 10−9 in the combined set; P = 1.0 × 10−6 in the discovery set; and P = 4.1 × 10−4 in the replication set). The variant allele of this SNP (T allele; MAF, 0.33) was associated with an increased risk of SCC with odds ratio (OR) of 1.34 [95% confidence interval (CI) 1.22–1.47] compared with the wild allele (A allele; Table 2). This SNP was located in the intron region of the differentially expressed in FDCP 8 homolog (DEF8) gene. We additionally evaluated the association of all the SNPs within 200 kb surrounding this gene with SCC risk and presented the regional plot of DEF8 gene in Fig. 1. As shown in the regional plot, the SNP rs8051733 was in high LD with the identified SNP rs8063761 (LD r2 > 0.8). The SNP rs8051733 was also in the intron region of DEF8 gene, and the variant allele of the SNP rs8051733 (G allele; MAF, 0.30) was associated with an increased risk of SCC with OR of 1.37 (95% CI, 1.16–1.61; P = 2.0 × 10−4 in the discovery set) compared with the wild allele (A allele).

Regional plot of the DEF8 gene. Regional Manhattan Plots for the SCC GWAS within 200 kb surrounding the DEF8 gene on chromosome 6.

We additionally tested the association between the SNP rs8063761 and human pigmentation traits as well as the risk of other skin cancers based on our published GWAS data (2). As summarized in Supplementary Table S2, we found that the variant allele of this SNP (T allele) was associated with lighter hair color (P = 6.9 × 10−9), poorer tanning ability (P = 2.1 × 10−44), and increased number of previous sunburns (P = 3.7 × 10−11). No significant association was found between this SNP and the risk of BCC (P = 0.06) or the number of non-melanoma skin cancer (P = 0.47).

We compared the association between the SNP rs8063761 and SCC risk before and after adjustment of human pigmentation traits in the skin cancer study. As shown in Supplementary Table S3, this association remained significant, but substantially attenuated after additionally adjusted for hair color, tanning ability, and the number of sunburns (P = 4.1 × 10−4 before adjustment and P = 0.01 after adjustment).

Of interest, we found that the SNP rs8063761 was associated with the expression of DEF8 gene based on the published data of expression quantitative trait loci derived from a GWAS of global gene expression in the lymphoblastoid cell lines of 550 individuals of British descent (22). The variant allele of the SNP rs8063761 (T allele) was associated with a decreased expression of DEF8 gene (β = −0.14; SE = 0.03; P = 1.2 × 10−6). These data were available online from the Real-time Engine of eQTLs uploaded by Dr. Liang (23).

However, we noticed that the MC1R gene (a previously known pigmentation gene associated with SCC risk) is located close to the DEF8 gene. The SNP rs1805007 in the MC1R gene is in weak LD with the SNP rs8063761 in the DEF8 gene (LD r2 = 0.3). The variant allele of the SNP rs1805007 (T allele; MAF, 0.07) was associated with an increased risk of SCC with OR of 1.31 (95% CI, 1.07–1.62; P = 9.6 × 10−3 in the discovery set) compared with the common allele (C allele). After mutual adjustment by each SNP in the Harvard GWAS set, the association between the SNP rs8063761 and SCC risk remained significant (P = 9.8 × 10−3; OR, 1.23; 95% CI, 1.05–1.45), while the association of the SNP rs1805007 was no longer significant (P = 0.34). We further adjusted for all 7 common MC1R variants (Val60Leu, Val92Met, Arg151Cys, Ile155Thr, Arg160Trp, Arg163Gln, and Asp294His) among a subgroup of the skin cancer case–control study (257 SCC cases and 282 controls genotyped for all 7 MC1R SNPs) and the association of the DEF8 SNP rs8063761 became nonsignificant (P = 0.85; OR, 1.04; 95% CI, 0.71–1.51 after adjustment vs. P = 4.2 × 10−3; OR, 1.44; 95% CI, 1.12–1.85 before adjustment).

In addition, we validated the association between four other SNPs and SCC risk in the replication set, which were rs9689649 in the intron of parkinson protein 2 (PARK2) gene on chromosome 6 (P = 2.7 × 10−6 in the combined set; P = 3.2 × 10−5 in the discovery set and P = 0.02 in the replication set), rs754626 in the intron of v-src avian sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (SRC) gene on chromosome 20 (P = 1.1 × 10−6 in the combined set; P = 1.4 × 10−5 in the discovery set and P = 0.02 in the replication set), rs9643297 in the intron of ST3 beta-galactoside alpha-2,3-sialyltransferase 1 (ST3GAL1) gene on chromosome 8 (P = 8.2 × 10−6 in the combined set; P = 3.3 × 10−5 in the discovery set and P = 0.04 in the replication set), and rs17247181 in the intron of erbb2 interacting protein (ERBB2IP) gene on chromosome 5 (P = 4.2 × 10−6 in the combined set; P = 3.1 × 10−5 in the discovery set and P = 0.048 in the replication set; Table 3).

We also checked the associations between the previously reported SCC risk SNPs identified by candidate gene-based approach for their associations in this study and observed significant associations with consistent directions of effect for the SNPs in IRF4 (rs12203592, OR = 1.54 for T allele, P = 2.57 × 10−4), UBAC2 (rs7335046, OR = 1.19 for G allele, P = 0.03), and EXOC2 (rs12210050, OR = 1.34 for T allele, P = 2.33 × 10−5). The SNP in CTLA4 (rs3087243) was not associated with SCC risk in our study population (P = 0.47). The XRCC1 SNPs (rs25487 and rs1799782) were not tested in this study.

Discussion

In this study, we identified several genetic loci associated with SCC risk by a multi-stage SCC GWAS. To the best of our knowledge, this is the first reported GWAS on SCC to date, which sheds new lights on the genetic basis of SCC development.

The most significant locus identified in this SCC GWAS was the SNP rs8063761 in the intron of DEF8 gene. This SNP was also associated with human pigmentation traits, including hair color (P = 6.9 × 10−9), tanning ability (P = 2.1 × 10−44), and the number of sunburns (P = 3.7 × 10−11) in our cohort population. Given that human pigmentation traits are susceptibility factors for skin cancer (1, 24), we compared the association of this SNP with SCC before and after adjustment of pigmentation traits. We found that this association substantially attenuated after adjusted for hair color, tanning ability, and the number of sunburns in the skin cancer study in the Harvard cohorts (P = 4.1 × 10−4 before adjustment and P = 0.01 after adjustment), suggesting a mediated association of this SNP with SCC through pigmentation. Of interest, we found this SNP was also strongly associated with the expression level of DEF8 gene based on the published eQTL database (P = 1.2 × 10−6; ref.23).

However, the genetic function of DEF8 gene was largely unknown. Furthermore, we noticed that the DEF8 gene was close to the MC1R gene, which is a well-known pigmentation gene and has been associated with skin cancer risk (3–5). In this SCC GWAS, only one SNP in the MC1R gene, rs1805007, was in weak LD with the SNP rs8063761 (LD r2 = 0.3; based on HapMap phase II data build 35, CEU). After adjusting for the SNP rs1805007 in the Harvard GWAS set, the association of the SNP rs8063761 with SCC risk remained significant (P = 9.8 × 10−3; OR, 1.23; 95% CI, 1.05–1.45). However, further adjustment of all 7 common MC1R variants (Val60Leu, Val92Met, Arg151Cys, Ile155Thr, Arg160Trp, Arg163Gln, and Asp294His) in a subgroup of the replication study (257 SCC cases and 282 controls) eliminated such an association (P = 0.85 after adjustment vs. P = 4.2 × 10−3 before adjustment), suggesting such an association may be driven by the MC1R SNPs.

Other genetic loci identified in this study for the association with SCC include polymorphisms within the ST3GAL1, SRC, ERBB2IP and PARK2 genes. Although their associations didn't reach GWAS significance, the replication of these findings suggested potential effects of these loci on SCC risk. The ST3GAL1 gene encodes a sialyltransferase, β-galactoside α-2,3-sialyltransferase 1 (ST3Gal I), which has been reported to play a role in the development of multiple tumors (25–28). A previous study found high expression of ST3Gal I in human cutaneous SCC lesions (29), which further supported our findings. However, this is the first study to report an association between a genetic polymorphism in ST3GAL1 gene and skin cancer risk. The SRC gene is as a well-known proto-oncogene. Genetic abnormalities in this gene have been largely reported in multiple cancers (30–32), including melanoma (33). However, no previous studies have found genetic variants in SRC associated with SCC risk. The ERBB2IP gene encodes the Erbin protein, which is a binding partner of the Erb-B2 protein and regulates Erb-B2 function and localization. Erbin has also been shown to affect the Ras signaling pathway by disrupting Ras–Raf interaction (34). Altered expression of Erbin and Erb-B2 has been found in skin BCC tumors in a previous study, but not in SCC (35). The Parkinson disease–associated gene, PARK2, has also been associated with human malignancies in previous studies, including lung cancer (36, 37), glioblastoma (38, 39), and breast cancer (40), but no studies have reported on skin cancer. The precise function of this gene remains largely unknown.

A major strength of this study is the large sample size of pathologically confirmed SCC cases and healthy controls in this study and it is the first reported SCC GWAS so far. With the sample size of 745 SCC cases and 12,805 healthy controls, we estimated the power of 80% to detect the effect size of 1.40, 1.30, 1.20, and 1.15 for the genetic variants with minor allele frequency of 0.05, 0.10, 0.25, and 0.5, respectively, based on a two-sided α of 0.05. Besides, we were able to collect information on human pigmentation traits prospectively in our cohort population for mediation analyses.

In summary, we identified several novel genes and genetic loci associated with SCC risk using a multi-stage GWAS design. The identification of these genetic loci may help us understand the complex mechanisms of developing SCC and suggest new therapeutic targets for this common skin cancer. Further studies are warranted to validate our findings and extend to other populations. Functional studies are needed to elucidate the genetic functions of these identified genes and loci in SCC development.

Disclosure of Potential Conflicts of Interest

A.A. Qureshi is a consultant/advisory board member for Abbvie, Amgen, Centers for Disease Control, Janssen, Merck, Novartis, and Pfizer. No potential conflicts of interest were disclosed by the other authors.

Disclaimer

The authors assume full responsibility for analyses and interpretation of these data and declare that the funding sources had no role in the conduct, analysis, interpretation, or writing of this manuscript.

Grant Support

The Harvard GWAS part is supported by NIH grants CA122838, CA87969, CA055075, CA49449, CA100264, and CA093459. The Rotterdam Study is supported by the Netherlands Organisation of Scientific Research NWO Investments (175.010.2005.011, 911-03-012) and further funded by the Research Institute for Diseases in the Elderly (014-93-015; RIDE2), the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NOW, 050-060-810), Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Ministry of Education, Culture, and Science, the Ministry for Health, Welfare, and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. This work was also supported in part by NIH UM1 CA186107 and UM1 CA167552.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Acknowledgments

Harvard GWAS: The authors thank Dr. Frank B. Hu and Dr. Eric B. Rimm for their work on the component GWAS sets, Constance Chen for programming support, and Pati Soule and Dr. Hardeep Ranu of the Dana Farber/Harvard Cancer Center High-Throughput Polymorphism Detection Core for sample handling and genotyping of the NHS and HPFS samples. The authors are grateful to Merck Research Laboratories for funding of the GWAS of coronary heart disease and are also indebted to the participants in all of these studies. The authors also thank the participants and staff of the NNHS, the HPFS for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY.

Rotterdam Study: The authors thank the department of internal medicine at the ErasmusMC and Karol Estrada and Maksim V. Struchalin for their work on the genetic data. The authors are grateful to the study participants, the staff from the Rotterdam Study, and the participating general practitioners and pharmacists. The authors also thank Esther van den Broek and Lucy Overbeek from the PALGA foundation and dermatopathologist Senada Koljenovic from the Erasmus Medical Center.

Footnotes

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).