Abstract

Prostate cancer risk may be influenced by single genetic variants in the hormone-regulating genes androgen receptor (AR), cytochrome P450 (CYP17), and steroid-5-α-reductase type 2 (SRD5A2). In this study, we comprehensively investigated polymorphisms in these three loci and their joint effect in a large population-based study. We selected 23 haplotype-tagging single-nucleotide polymorphisms (htSNP) that could uniquely describe >95% of the haplotypes (6 in AR, 6 in CYP17, and 11 in SRD5A2). These htSNPs were then genotyped in the Cancer Prostate in Sweden population (2,826 case subjects and 1,705 controls). We observed significant association for several SNPs in the AR gene (P = 0.004-0.02) and CYP17 (P = 0.009-0.05) and one SNP in SRD5A2 (P = 0.02). Carriers of the most common AR haplotype had a significant excess risk to develop prostate cancer [odds ratio (OR), 1.25; 95% confidence interval (95% CI), 1.1-1.5; P = 0.002], yielding an estimated population attributable risk of 16% (95% CI, 0.06-0.25). Combining risk alleles from these genes yielded a 12% risk increase for each additional high-risk allele carried (95% CI, 1.1-1.2; P for trend = 9.2 × 10−5), with an overall OR of 1.87 (95% CI, 1.0-3.4) for carriers of all five included risk alleles, an OR of 2.13 (P for trend = 8 × 10−4) for advanced disease, and an OR of 4.35 (P for trend = 7 × 10−5) for disease onset before age 65 years. Genetic variation in key genes in the androgen pathway is important for development of prostate cancer and may account for a considerable proportion of all prostate cancers. Carriers of five high-risk alleles in the AR, CYP17, and SRD5A2 genes are at ∼2-fold excess risk to develop prostate cancer. (Cancer Res 2006; 66(22): 11077-83)

prostate cancer

association

polymorphism

haplotype

androgen pathway

Introduction

Prostate cancer constitutes a major health problem being the most common malignancy among men in developed countries. Accumulating evidence supports an important role of genetics in prostate cancer etiology, yet the responsible genes remain largely unidentified (
1).

The prostate gland depends on androgens for its growth and maintenance, and they partly control cell proliferation and differentiation (
2). Androgens are suggested to be involved also in prostate cancer development (
3). Early-stage prostate cancer is androgen dependent, and androgen ablation is widely used to treat advanced disease (
4). Accordingly, genetic variation within androgen-related genes has been proposed to influence prostate cancer risk. Specifically, three genes have been the target for intense investigation: the androgen receptor (AR; OMIM 313700), which among other things mediates androgen action, binds the male sex hormones testosterone and dihydrotestosterone (
5), and activates expression of several other androgen-responsive genes (
6); the cytochrome P450 (CYP17; OMIM 609300), which functions as a catalyst in key reactions involved in the last step of the testosterone biosynthesis (
3); and the steroid-5-α-reductase type 2 (SRD5A2; OMIM 607306) responsible for converting testosterone to its biologically more active form dihydrotestosterone in the prostate (
3).

Recently, we studied genetic variants reported to be associated with prostate cancer risk in a large Swedish prostate cancer case-control study (
7). In total, we assessed 46 genetic variants in 27 genes and we were able to replicate 6 genetic variants in 5 different genes. Four of these replicated variants were located in the AR gene (CAG repeat), the CYP17 gene (rs743572), and the SRD5A2 gene (rs676033, rs523349). To this end, we now comprehensively investigate the effect from common germ-line genetic variants in the AR, CYP17, and SRD5A2 genes in a larger study. We used the approach of haplotype-tagging single-nucleotide polymorphisms (htSNP) to capture the majority of genetic variation and genotyped in total 23 SNPs in 4,531 men from a large Swedish population-based prostate cancer case-control study [Cancer Prostate in Sweden (CAPS)].

Materials and Methods

Case-control study. CAPS is a population-based case-control study described in detail elsewhere (
8). Recruitment of participants lasted between March 2001 and October 31, 2003. We included all men living in the northern and central parts of Sweden, under the age of 80 years, and all men living in the Stockholm region and southeastern part of Sweden, under the age of 65 years.

All patients with a newly diagnosed carcinoma of the prostate were eligible to participate in the study. It is compulsory to report newly diagnosed cancer cases to regional cancer registries in Sweden, and by using these registries, we achieved complete case ascertainment. We obtained detailed clinical information, such as Gleason score, prostate-specific antigen (PSA) level at time of diagnosis, and tumor-node-metastasis stage from the Swedish National Prostate Cancer Registry.
4 All cancers were cytologically (<5%) or pathologically verified. In total, 3,648 prostate cancer patients were identified and invited to the study. Of them, 3,039 (83%) agreed to participate, and for those, a blood sample and a questionnaire about risk factors and family history were collected. The clinical characteristics of the study subjects are presented in
Table 1
. Case subjects were classified as advanced cases if they met at least one of the following criteria: T3/T4, N+, M+, Gleason score of 8 to 10, or PSA level ≥50 ng/mL.

Control subjects were randomly selected from the Swedish Population Registry, frequency matched according to the expected age distribution of cases (groups of 5-year interval) and geographic region (two regions, representing north and south of Sweden, including Stockholm). Control subjects were recruited concurrent with case subjects. A total of 3,153 controls was invited to the study and 1,895 (60%) agreed to participate. At time for this study, DNA was available for 2,826 cases and 1,705 controls. Written informed consent was obtained from each participant. The ethics committee at the Karolinska Institutet (Stockholm, Sweden) and Umeå University (Umeå, Sweden) approved the study.

SNP selection. The target region for selection of SNPs in the AR gene included 3 kb of the promoter region, all exons, introns, and 8 kb of the 3′-untranslated region (UTR). Using the public database SNPper,
5 we identified altogether 116 SNPs and selected a subset of 52 SNPs with the criterion of 1 SNP per 4 kb. We prioritized validated SNPs and missense mutations.

For CYP17 and SRD5A2, SNP selection was based on phase II data from the International HapMap Project.
6 We only included SNPs with a minor allele frequency >5%. By including complete haplotype blocks as defined by Gabriel et al. (
9), our target region reached both upstream and downstream of the gene as long as linkage disequilibrium was maintained. For CYP17, the region examined reached from 14 kb upstream of the transcription start site to 9 kb downstream of the 3′-UTR. For SRD5A2, we examined a region extending from 54 kb upstream of the transcription start site to 2 kb downstream of the 3′-UTR.

Tagging methodology. We genotyped the selected 52 SNPs in AR in a randomly selected subset of 94 controls from the CAPS study. Consequently, four SNPs were eliminated because of failed assay, three due to low success rate, and 30 due to monomorphic results. From the remaining 15 SNPs, haplotypes were constructed. By haplotype deviation analysis using the htSNP2 software,
7 we selected four SNPs that explained >95% of the haplotype diversity as htSNPs. At this time, the HapMap Project had released data for the AR gene. By comparing our haplotype distribution with theirs, we decided to complement our data with two additional SNPs and extended our target region to include 10 kb of the promoter and 11 kb of the 3′-UTR. In total, six SNPs were genotyped in the whole study population.

For CYP17 and SRD5A2, we selected htSNPs with the tagSNPSv2 software (
10). The target region of CYP17 comprised one haplotype block. In total, seven SNPs were selected for genotyping in the whole study population. The target region of SRD5A2 comprised two haplotype blocks, which were tagged individually. htSNPs were selected to capture at least 95% of the haplotype diversity. In addition, we filled the gap between the blocks by tagging the whole region with already selected htSNPs and chose possible additional SNPs needed to ensure a total coverage of at least 95% of the genetic variation. In total, 11 SNPs were selected and genotyped in the whole study population.

Genotyping. The DNA samples were genotyped using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (Sequenom, Inc., San Diego, CA; ref.
11). PCR assays and associated extension reactions were designed using the SpectroDESIGNER software (Sequenom), and primer sequences are available on request (Metabion GmbH, Planegg-Martinsried, Germany). All amplification reactions were run in the same conditions in a total volume of 5 mL with 2.5 ng of genomic DNA, 1 pmol of each amplification primer, 0.2 mmol/L of each deoxynucleotide triphosphate, 2.5 mmol/L MgCl2, and 0.2 units of HotStarTaq DNA Polymerase (Qiagen, Inc., Valencia, CA). Reactions were heated at 95°C for 15 minutes and subjected to 45 cycles of amplification (20 seconds at 94°C, 30 seconds at 60°C, 30 seconds at 72°C) before a final extension of 7 minutes at 72°C. Extension reactions were conducted in a total volume of 9 mL using 5 pmol of allele-specific extension primer and the Mass EXTEND Reagents kit before being cleaned using SpectroCLEANER (Sequenom) on a MULTIMEK 96 automated 96-channels robot (Beckman Coulter, Fullerton, CA). Clean primer extension products were loaded onto a 384-element chip with a nanoliter pipetting system (Sequenom) and analyzed by a MassARRAY mass spectrometer (Bruker Daltonik GmbH, Bremen, Germany). The resulting mass spectra were analyzed for peak identification using the SpectroTYPER RT 2.0 software (Sequenom). For each SNP, two independent scorers confirmed all genotypes.

Statistical methods. Hardy-Weinberg equilibrium test for each autosomal SNP was done using a replication method as implemented in the GENETICS package
8 for the R programming language. For each test, 10,000 permutations were run.

We tested for association between prostate cancer risk and each SNP using a likelihood ratio test of a covariate equal to number of rare alleles (0, 1, or 2) based on an unconditional logistic regression model as implemented in the STATA software. We adjusted for age and geographic region using indicator variables representing each combination of age of onset (groups of 5-year interval) and geographic region (two regions, representing north and south of Sweden, including Stockholm).

We used the HAPLO.STATS package (
12) in the software language R for haplotype analysis. This method provides both global and haplotype-specific tests. We adjusted all computations for age and geographic region as described above. Haplotypes with a frequency <0.01 were pooled together. We calculated empirical Ps by randomly permuting the trait and covariates (
13). Precision criterion for the Ps was set to a sample SE of one fourth of the estimated P, but at least 1,000 permutations were run for each simulation. We estimated haplotype-specific odds ratios (OR) for haplotype A in AR by calculating the “dosage” using the tagSNPSv2 software (
10) and use it as a covariate in a logistic regression. Because men carry only one copy of AR, the only phase ambiguity in AR haplotypes is due to missing genotypes.

We tested for interaction between the genes by selecting the most significant SNP in each gene and included them and their two-order product terms as covariates in a logistic regression. Interaction was then assessed with a likelihood ratio test. In addition, we estimated the joint effect by calculating number of risk alleles carried (ranging from 0 to 5) for each subject and test for association with prostate cancer risk. The population attributable risk (PAR) was estimated by maximum likelihood estimation (
14). All reported Ps are based on two-sided tests.

Results

Genotyping failed for one selected htSNP in CYP17, rs17724534. Average genotyping success for the other SNPs was 97.7% (range, 90.6-99.0%). The concordance rate between duplicated samples (n = 330) was 99.96%. All autosomal SNPs were in Hardy-Weinberg equilibrium among the controls (P > 0.05).

Association between prostate cancer and SNPs in AR, CYP17, and SRD5A2 among Swedish men

Carriers of one allele of the rs2486758 SNP located in the promoter region of CYP17 were at 15% higher risk to develop prostate cancer than noncarriers. Moreover, three SNPs located at the end of the gene were associated with a decreased risk of prostate cancer (
Table 2). For all CYP17 SNPs, the association became more pronounced in cases with early-onset disease. For example, carrying two copies of the rs619824 ‘T’ allele decreased the risk for a diagnosis at an early age by 36% (OR, 0.64; 95% CI, 0.49-0.83; P = 0.0009).

Haplotype frequencies for prostate cancer cases and controls in AR, CYP17, and SRD5A2

An overall difference in CYP17 haplotype frequencies between cases and controls was observed (global P = 0.03;
Table 3). Specifically, the TAAAGC haplotype was more common among controls (27%) than cases (26%; P = 0.04;
Table 3). These observations were more pronounced in patients with an early-onset cancer (global P = 0.02; P = 0.005 for the TAAAGC haplotype).

We did not observe any differences in haplotype frequencies between cases and controls at the SRD5A2 locus. However, we observed that carriers of the GTA haplotype in block 1 had a decreased risk to develop advanced prostate cancer (P = 0.04).

Combination of SNPs. We estimated the joint effect of multiple SNPs in the same pathway by calculating the combined risk estimate for the rs6152 SNP in AR, rs619824 in CYP17, and rs623419 in SRD5A2. We did not observe any statistical interaction between these SNPs (P = 0.61). Because both rs6152 and rs619824 were associated with decreased risk of prostate cancer, we regarded the common allele as the risk allele. The number of risk alleles ranged between 0 and 5. Each additional high-risk allele was associated with a 12% risk increase (95% CI, 1.06-1.19; P = 0.00009;
Table 4
). This risk increment was similar in advanced cases (OR, 1.13; 95% CI, 1.05-1.21; P = 0.0008;
Table 4) but more pronounced in cases with an early-onset disease (OR, 1.20; 95% CI, 1.09-1.30; P = 0.00007;
Table 4). Compared with the reference group, individuals with five risk alleles exhibited significant higher risk of prostate cancer (OR, 1.87; 95% CI, 1.02-3.42;
Table 4). The risk of early-onset prostate cancer was 4.35 times higher (95% CI, 1.67-11.3) for carriers of five risk alleles compared with those carrying none.

The CAG repeat in the androgen receptor. Earlier, we have reported that carriers of >22 AR CAG repeats exhibit a significantly higher risk of developing prostate cancer (
7). This study was based on the first recruitment phase of CAPS (1,461 case subjects and 796 controls). In this study, we investigated the relationship between AR SNPs and the number of CAG repeats (long/short, where long is defined as >22 repeats) and found a strong genetic correlation between these (D′ ranging from 0.53 to 1.00).

We noticed that carrying a long allele reduces haplotype variability to include only the GGAAGC haplotype or the GGAAGT haplotype, whereas all haplotypes were represented among individuals with short repeats.

Discussion

In this study, we examined if germ-line genetic variation in the androgen metabolic pathway alters risk for prostate cancer development. In total, 23 SNPs located in three hormone-related genes (AR, CYP17, and SRD5A2) were genotyped in 2,826 prostate cancer cases and 1,705 controls. We found that both individual SNPs and haplotypes were associated with prostate cancer risk. Specifically, the most common AR haplotype was carried by 78% of affected men compared with 74% of population controls, yielding a PAR of 16%. This haplotype associated more strongly with aggressive forms of prostate cancer (PAR, 22%) and young age at onset (PAR, 22%). In addition, several AR SNPs were significantly correlated with prostate cancer.

In agreement with our results, a smaller study from Australia (
15) recently identified an association between rs6152 and reduced risk of metastatic prostate cancer. A U.S. study investigated AR polymorphisms and prostate cancer risk (
16) by genotyping 32 SNPs in 882 cases and 874 controls with a multiethnic background. Although no overall association was found, 11 of 32 SNPs were associated with a reduced risk for an aggressive disease in concordance with our results. However, because their population included five different ethnic backgrounds, the sample size within each ethnicity was limited.

Individuals having a brother with prostate cancer are at higher risk than those with an affected father (
17), suggesting an X-linked genetic model. The AR gene is located on Xq11.2-q12 and constitutes a compelling candidate gene for prostate cancer. Accordingly, the most investigated polymorphism in prostate cancer genetics is the CAG repeat located in exon 1 in the AR gene, 401 bp upstream from rs6152. An inverse relationship between CAG repeats and AR transcriptional activity has been found (
18). Results from case-control studies have, however, been conflicting and inconclusive, although the majority suggests an increased risk for carriers of few repeats. Recently, we found that carriers of >22 repeats exhibit a higher risk of developing prostate cancer (
7). When we included the CAG repeat (long/short) in our haplotype analysis, we noticed that the GGAAGC haplotype was divided into two haplotypes. Interestingly, none of these haplotypes were associated with risk as would have been expected if the CAG repeat marker was casual. Our result therefore suggests that the CAG repeat is not a causal variant for prostate cancer development. More likely, this repeat is in strong linkage disequilibrium with a yet unidentified variant located nearby.

Multiple CYP17 SNPs and haplotypes were associated with prostate cancer risk in this study, especially for patients with an early-onset disease. Interestingly, association was more pronounced in the 3′-UTR region (rs619824;
Table 2). A recent study among 715 men from 266 prostate cancer families found a protective effect for this SNP in concordance with us (
19). They also found that this SNP is located within a region showing sequence homology to a CCAAT/enhancer-binding protein, known to be a strong regulator of transcription. Thus, it is possible that this SNP may have direct functional relevance. The lack of functional studies on these genes makes it difficult to elaborate with the consequences on gene function these polymorphisms convey. However, a low testosterone level has been shown to correlate with poorly differentiated cancer and poor prognosis repeated times (
20–
26).

We observed an increasing risk effect with number of risk alleles carried in this pathway; however, we could not detect any interaction effect between these genes. This suggests that these genes individually affect initiation of prostate cancer, and the sum of their main effects produces a substantial risk increase particularly for advanced and early-onset cancers. Patients carrying all five risk alleles have two times higher risk to develop advanced prostate cancer, and they are four times more likely to receive a diagnosis at a young age (
Table 4) than noncarriers. Our results suggest that combined analysis of multiple SNPs in the same pathway may strengthen otherwise nominally associations. This is in line with the hypothesis of prostate cancer being a multigenic disease. Indeed, we do not expect that one single genetic variant will have a useful predictive value in risk assessment, and this study shows a possible way to identify subgroups with noteworthy risk increase.

To account for multiple testing, we did a data simulation by randomly permuting case-control status and then reevaluated association for each SNP. Adjusted Ps were then computed based on the empirical distribution of the maximum of the 23 test statistics. In addition, we estimated the probability of observing at least 9 significant associations (at the 5% level) of 23 tests under the null hypothesis of no associations. Based on 10,000 permutations, the only SNPs that remained significant were rs6152, rs7061037, and rs5964607 in the AR; however, the probability for observing at least nine significant associations was estimated to only 0.8%.

This study is to our knowledge the largest (>4,500 subjects) homogenous case-control study of prostate cancer. The CAPS study is unique in the context of a well-characterized phenotype and a population-based ascertainment from a homogenous population. Because prostate cancer diagnosis was little influenced by PSA screening at the time for this study (
27), our findings apply predominantly to a clinically significant disease. Indeed, 43% of included cases are classified as having an aggressive cancer with high risk of disease progression. When generalizing our results, it is important to consider the differences in clinical characteristics between North American prostate cancer populations and the CAPS population. Citizens in North America have experienced a dramatic increase in widespread testing of PSA, and a considerable fraction of their diagnosed prostate cancer cases are clinically nonsignificant.

One possible limitation with CAPS is the low participation rate among the controls (60%) because it could introduce selection bias if the willingness to join the study is associated with the assessed genotypes. To address this issue, we compared baseline characteristics (e.g., age, body mass index, smoking status, and level of education) of participants who completed the questionnaire and donated blood with those of participants who only answered the questionnaire. Among both cases and controls, characteristics did not differ significantly between those who did and did not donate a blood specimen. Therefore, we find it unlikely that selection bias is an issue in our study.

In conclusion, this study provides strong support that genetic variation in hormone-related genes is important for prostate cancer development. Specifically, we identified a common haplotype at the AR locus, which seems to have a noteworthy effect on prostate cancer risk at a population level. Moreover, the joint effect of risk alleles in the AR, CYP17, and SRD5A2 genes can identify men with an increased risk of 1.5 to 2.0 to develop prostate cancer, a finding that warrants confirmation in other large prostate cancer studies.

Acknowledgments

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank all study participants in the CAPS study; Ulrika Undén for skillfully coordinating the study center at Karolinska Institute; all urologists, including their patients, in the CAPS study and all urologists providing clinical data to the national registry of prostate cancer; Karin Andersson, Susan Lindh, Gabriella Thorén, and Margareta Åswärd (Regional Cancer Registries); CAPS steering committee, including Pär Stattin, Jan-Erik Johansson, and Eberhart Varenhorst; Sören Holmgren and the personnel at the Medical Biobank in Umeå for skillfully handling the blood samples; the personnel at Mutation Analysis Facility at Karolinska Institute; Cecilia Lindgren; and Prof. Jianfeng Xu and Dr. Siqun Zheng (Wake Forest University School of Medicine, Winston-Salem, NC) for their contribution in the second part of CAPS.