Introduction

Breast cancer incidence varies substantially worldwide. Europe, Australia, North America, Argentina, and Uruguay have age adjusted breast cancer incidence rates of up to 101 per 100,000, whereas most African, Asian, and Latin American countries have incidences of <52 per 100,000 (1). This variation might be the result of differences in reproductive, hormonal, lifestyle, and possibly genetic factors between populations (2-4). Differences in breast cancer incidence have been observed not only between countries but also between populations within countries. For example, Hispanic women in the San Francisco Bay Area had a breast cancer incidence that was 35% lower than that for non-Hispanic White women for the period 1998 to 2002 (5). A large proportion of the difference in incidence disappears in second and third generation immigrants, suggesting a major environmental influence on breast cancer risk (2). The degree to which genetic factors play a role is unknown.

With respect to breast cancer susceptibility, U.S. Latinas are a unique group to study because of the diversity of their environmental and cultural exposures and their genetic ancestry. The term Hispanic or Latino describes a heterogeneous population with a shared history of colonization and a common language but does not refer to a fixed biological entity or a single common ancestry. Latinos, in terms of their genetic background, represent a complex mix of indigenous American, European, and African ancestries (6). Understanding cancer susceptibility in this population is crucial because Latinos account for 14% of the total population of the nation and are predicted to comprise 25% of the population by the year 2050 (7).

Unknown environmental factors might be important in explaining some of the differences in breast cancer risk among subsets of Latina women. Among U.S. Latinas, those born in the United States have a higher risk for postmenopausal breast cancer compared with foreign-born Latinas, even after controlling for reproductive and other risk factors (2). Genetic factors might also contribute to differences in breast cancer risk among subgroups of Latinas. We recently showed that European genetic ancestry was associated with increased risk for breast cancer among U.S. Latinas (8). We compared the mean genetic ancestry among U.S. Latina breast cancer cases and U.S. Latina controls from the San Francisco Bay Area and found that cases had significantly greater European ancestry and significantly lower indigenous American ancestry, even after adjusting for known reproductive and other environmental risk factors, including place of birth and age at migration.

We sought to retest the association in women residing in Mexico because of the possibility that the original findings between genetic ancestry and breast cancer risk in U.S. Latinas might be confounded by unmeasured environmental exposures. Specifically, we investigated the relationship between breast cancer risk and genetic ancestry using a new and independent sample of 1,881 women in Mexico. We hypothesized that, if there is a United States–specific environmental component that increases breast cancer risk and that is associated with European genetic ancestry in U.S. Latinas, then there should be no association between genetic ancestry and breast cancer risk in the Mexican women.

Materials and Methods

Source of cases and controls

Mexican samples

Analyses were done using DNA and epidemiological data from a multicenter case-control study designed to examine predictors of breast cancer risk among Mexican women with the age of 35 to 69 y who resided for at least 5 y in Mexico City, Monterrey, or Veracruz.

Newly diagnosed cases were identified at 12 hospitals from the major healthcare systems in Mexico: the Mexican Institute of Social Security (IMSS; six hospitals), the Social Security system for state workers (ISSSTE; two hospitals), and the Ministry of Health (SS; four hospitals), which provides health care to those who do not belong to any of the health care systems stated above. Inclusion criteria for cases were (a) histologically confirmed new diagnosis of breast cancer between the years 2004 and 2007, including invasive and in situ tumors; (b) no previous treatment such as radiotherapy, chemotherapy, or antiestrogens in the last 6 mo; and (c) no present treatment with exemestane, letrozole, anastrozole, or megestrol. Pregnant women or cases known to be HIV infected were excluded from the study. Only two cases were using antiestrogens and therefore excluded because we were interested in measuring breast density and antiestrogens could have modified it. Only one case known to be HIV positive was excluded from the study. In this study, cases and controls were derived from the same source populations.

Controls were selected based on a probabilistic multistage sampling design. One or more geostatistical areas (from Spanish; Área Geoestadística Básica) considered in the catchment area of the participating hospital were randomly selected. Women were randomly sampled to obtain specific numbers of women in each 5-y age category (35-39, 40-44, 45-49, 50-54, 55-59, 60-64, and 65-69 y) based on the age distribution of cases reported by the Mexican Tumor Registry in 2002. If more than one woman with the age of 35 to 69 y was present in the household, only one was selected (25 controls and 1 case within our sample had a sister of similar age living in the same household; ref. 9). Trained personnel visited the selected households and determined willingness to participate in the study.

Data collection included the administration of a structured questionnaire at the participant's home and collection of anthropometric measurements and a blood sample at the hospital. In addition, a mammogram was taken for controls. The response rate of the cases was 95.5% for Mexico City, 94.4% for Monterrey, and 97.4% for Veracruz. The response rate of the controls was 87.4% for Mexico City, 90.1% for Monterrey, and 97.6% for Veracruz. The complete sample included 1,000 cases and 1,074 controls. DNA for the present study was available for a total of 1,938 subjects (880 cases and 1,058 controls). More detailed information about the description of the multicenter population-based case-control study has been recently published (9).

All participants provided a written informed consent before the in-person interview. The study was approved by the Institutional Review Board at each institution participating in this collaborative study.

Data collection

The health questionnaire collected information on sociodemographic characteristics; reproductive factors; use of oral contraceptives and hormone replacement therapy; family and personal history of chronic diseases; personal history of transmitted sexual diseases; histories of body size, smoking, and alcohol consumption; and history of medical X-rays and mammograms. Information on usual dietary intake in the past year was collected using a food frequency questionnaire adapted from the Willet questionnaire (10) and validated in a sample of Mexican women (11).

Study participants were asked about usual physical activity during a 1-wk period that reflected the activity done during the last 12 mo before any breast cancer symptoms were perceived in cases and during the last 12 mo before recruitment in the controls (12). Three different categories of physical activity were defined: strenuous, moderate, and light (13).

Standing height, weight, and hip and waist circumferences were measured by the interviewer at the hospital. Body mass index (BMI) was calculated as measured weight (kilogram) divided by measured height (meter) squared.

Marker selection and ancestral populations

A set of 106 single nucleotide polymorphisms (SNPs) that can separate indigenous American, African, and European ancestry was used to estimate the proportion of genetic ancestry in the sample of Mexican women with breast cancer and unaffected controls. Simulation studies have shown that ∼100 ancestry informative markers with allele frequency differences similar to the ones we used are required to achieve an r of >0.9 with true ancestry (14). The ancestry informative markers used in this study were biallelic SNPs selected from the Affymetrix 100K SNP chip (Affymetrix). Ancestry informative marker selection was based on calculations of allele frequency differences between Europeans, West Africans, and indigenous Americans. The SNPs chosen maximize information for more than one ancestral population pairing, with a large difference in allele frequency between ancestral populations (>0.5). The ancestry informative markers are widely spaced throughout the genome and have a well-balanced distribution across all 22 autosomal chromosomes. The average distance between markers is about 2.4 × 107 bp. The parental population samples that were genotyped on the Affymetrix 100K SNP chip included 42 Europeans (Coriell's North American Caucasian panel), 37 West Africans (nonadmixed Africans living in London, United Kingdom, and South Carolina), and 30 indigenous Americans (15 Mayans and 15 Nahuas; refs. 8, 15).

Genotyping

Genotyping of the 106 ancestry informative markers for the Mexican women was done at the Children's Hospital Oakland Research Institute. Genotyping was done using a multiplex PCR coupled with single base extension methodology with allele calls using a Sequenom analyzer. Primers and reaction conditions have been previously published (8). Samples were genotyped without knowledge about case/control status by the laboratory personnel.

A total of 1,938 Mexican cases and controls were newly genotyped for this study (880 cases and 1,058 controls). The average call rate for the SNPs was 98.9%. After we removed three SNPs with a call rate <90%, the average call rate for the SNPs was 99.2%. The average sample call rate was 98.7%. After we removed 57 samples (34 cases, 23 controls) with a call rate <85%, the average call rate for the samples was 99.6% (the excluded samples had no significant differences for the variables analyzed). We genotyped 66 duplicate pairs, and of these, three pairs were excluded from the mismatch analysis because the call rate for one of the duplicates was low (7%, 26%, and 25%) compared with the high call rate of most samples in the study. The overall error rate without including the duplicate pairs with low call rates was 0.02%. All the ancestry informative markers were in Hardy-Weinberg equilibrium.

Genotypes and phenotype information was available for a total of 1,881 women from Mexico (846 cases and 1,035 controls).

Statistical analysis

We used a maximum likelihood approach to estimate genetic ancestry at the individual level (16, 17). To compare the characteristics of cases and controls for Mexican women, we used t tests for continuous variables and Fisher's exact tests for categorical variables. Mean genetic ancestry was estimated as the average of the individual genetic ancestry estimates within a group.

To assess the association between breast cancer risk and genetic ancestry in the sample of Mexican women residing in Mexico, we used unadjusted and adjusted logistic regression models. European or indigenous American genetic ancestry were modeled as categorical variables (0-25%, 1; 26-50%, 2; 51-75%, 3; 76-100%, 4). We also evaluated the models with genetic ancestry represented as a continuous variable (percent genetic ancestry). The unadjusted and adjusted models included as covariates the recruitment site (Mexico City, Veracruz, or Monterrey) and the type of health insurance to which the individuals belong (SSA, ISSSTE, and IMSS). We stratified the unadjusted analysis by recruitment site to evaluate if the estimation between genetic ancestry and breast cancer risk was consistent across the three sites. The multivariate models adjusted for European ancestry, age (continuous), family history of breast cancer in first-degree relatives (yes, no), personal history of benign breast disease (yes, no), age at menarche (continuous), number of full-term pregnancies (continuous 0-6, ≥7), age at first full-term pregnancy (1, ≤20 y; 2, between 21 and 30 y; 3, >30 y; 4, no full-term pregnancies), breast-feeding (ever, never), history of hormone replacement therapy use/menopausal status (0, postmenopausal/ever use of hormone replacement therapy; 1, premenopausal/ever use of hormone replacement therapy; 2, postmenopausal/no hormone replacement therapy; 3, premenopausal/no hormone replacement therapy), alcohol intake during the reference year (defined as 12 mo previous to diagnosis in cases and 12 mo previous to recruitment in controls; one or more drinks a month in a year or longer, 1; otherwise, 0), daily caloric intake during the reference year (continuous), education (none, 0; some elementary school, 1; completed elementary school, 2; high school, 3; college, 4; postgraduate studies, 5), moderate physical activity (hours per week; continuous), and socioeconomic status (low, medium, high). For the construction of the socioeconomic status variable, we combined information about different belongings (that is, gas or electric stove, water heating system, radio or cassette recorder, television, videocassette recorder, CD player, refrigerator, washing machine, microwave oven, blender, vacuum cleaner, water pump, motorcycle, car or van, fixed phone, cellular phone, computer, and dish antenna). The polychoric correlation was used for the construction of a socioeconomic status index (18) categorized into tertiles among controls (low, medium, high).

Individuals with missing data were dropped from the multivariate analysis (161 cases and 180 controls). We evaluated models including European and African ancestry. We present results based on a model that included European ancestry as the predictor. Native American ancestry is the counterpart of European ancestry, and therefore, the results can also be interpreted in terms of the former.

We evaluated possible interactions between genetic ancestry and other risk factors (e.g., hormone replacement therapy, BMI, parity, age at first full-term pregnancy, breast-feeding, and menopausal status).

All statistical analyses were done using the program STATA (19), and all tests are two sided.

Results

Characteristics of breast cancer cases and controls from Mexico are presented in Table 1. Mexican cases were significantly older at diagnosis, had fewer full-term pregnancies, were less likely to breast-feed, were more likely to report a personal history of benign breast disease, a family history of breast cancer, or history of hormone replacement therapy use, and had higher alcohol intake and higher daily caloric intake. Cases also reported a significantly higher level of education and socioeconomic status and lower BMI than controls (the relationship between BMI and breast cancer risk was only observed among premenopausal women). Finally, women with breast cancer had more European and less indigenous American ancestry than controls. There were no significant differences between cases and controls in age at menarche or African ancestry. The proportion of European and indigenous American ancestry in Mexican controls differed by recruitment site. Monterrey had the largest proportion of European ancestry (40%; SD = 16) compared with Mexico City (28%; SD = 19) and Veracruz (30%; SD = 18). The proportion of indigenous American ancestry was estimated to be 54% (SD = 15) in Monterrey, 69% (SD = 20) in Mexico City, and 64% (SD = 20) in Veracruz. The proportion of African ancestry was estimated to be 6% (SD = 5) in Monterrey, 3% (SD = 4) in Mexico City, and 6% (SD = 8) in Veracruz.

We investigated the association between the different reproductive, demographic, lifestyle characteristics, and genetic ancestry among the controls (Table 2) and observed that socioeconomic status, education, daily kilocalorie intake, and family history of breast cancer significantly differ by ancestry category, with family history being more common among women with higher European ancestry and socioeconomic status, education, and daily kilocalorie intake being higher in women with higher European ancestry. We also explored the relationship between the different characteristics of the controls and socioeconomic status (Table 3). These results show a very strong relationship between socioeconomic status and genetic ancestry, all reproductive variables, education, alcohol consumption, and daily kilocalorie intake. Women with higher socioeconomic status tend to have less full-term pregnancies, higher European ancestry, breast-feed less, consume more alcohol and kilocalories, and have more years of education than women with lower socioeconomic status.

Characteristics of Mexican controls by socioeconomic status, 2004 to 2007

The association between genetic ancestry and breast cancer risk in the sample of Mexican women is shown in Table 4. The unadjusted model showed a strong association with European genetic ancestry. Using 0% to 25% European ancestry as the reference, odds ratios were 1.16 [95% confidence interval (95% CI), 0.94-1.43; P = 0.171] for the 26% to 50% European ancestry, 1.80 (95% CI, 1.35-2.39; P < 0.001) for the 51% to 75% category, and 2.22 (95% CI, 1.46-7.11; P = 0.004) for the 76% to 100% category. When known risk factors were adjusted for (Table 4), the association with European ancestry was attenuated but the trend remained statistically significant (26-50% odds ratio, 1.01; CI, 0.78-1.30; P = 0.962; 51-75% odds ratio, 1.35; 95% CI, 0.96-1.91; P = 0.087; 76-100% odds ratio, 2.44; 95% CI, 0.94-6.35; P = 0.067; P for trend = 0.044). We chose to represent ancestry as a categorical variable because it allowed us to compare the effect size for the extremes of the ancestry distribution. However, the effect is also seen when ancestry is entered as a continuous variable in the model. For the model that included genetic ancestry as a continuous variable, the odds ratio for every 25% increase in European ancestry was 1.20 (95% CI, 1.03-1.41; P = 0.019). To ensure that there was no confounding due to regional differences between cases and controls, the unadjusted model was stratified by recruitment site (Mexico City, Monterrey, Veracruz), with all results showing the same trend as the global analysis (Table 4), with the effect weakening as sample size decreases.

Association between genetic ancestry and breast cancer risk in women residing in Mexico, 2004 to 2007

In the adjusted model, the associations between breast cancer and alcohol consumption, parity, family history, age at menarche, benign breast disease, kilocalorie intake, and moderate physical activity were in the expected direction (Table 4). When the model testing the association between genetic ancestry and breast cancer risk included socioeconomic status and education as the only covariates, the two covariates showed an effect on breast cancer risk. However, when other covariates were added to the model, the effect of socioeconomic status and education became nonsignificant (Table 4). Number of full-term pregnancies, daily kilocalorie intake, and benign breast disease were the variables that, when added to the model, absorbed most of the effect of socioeconomic status and education.

We found no evidence of significant interaction between genetic ancestry and hormone replacement therapy use, BMI, parity, age at first full-term pregnancy, breast-feeding, and menopausal status.

Discussion

We found an association between European genetic ancestry and breast cancer risk among Mexican women that reside in Mexico. Women with greater European ancestry had an increased risk for breast cancer compared with women with lower European ancestry, which could also be interpreted as indigenous American ancestry being protective against the development of breast cancer. We have previously reported a significant association between genetic ancestry and breast cancer risk in U.S. Latinas, with a 39% increase in breast cancer risk with every 25% increase in European ancestry (8). If the observed association in the U.S. Latinas were solely due to confounding by a specific environmental risk factor associated with genetic ancestry in the United States, then we would expect not to observe an association in the Mexican women. However, given that the association between breast cancer risk and genetic ancestry was also present in Latina women residing outside the United States, it is reasonable to conclude that, if there is an environmental component that is driving the association between genetic ancestry and breast cancer risk, it must be common to U.S. Latinas and Mexican women. The association is weaker in the Mexicans than in the U.S. Latinas, which could be due to differences in the available covariates. It may also be due to the possibility that unmeasured confounding was greater in the U.S. samples.

Among Mexican women, the association between genetic ancestry and breast cancer risk was attenuated after adjustment for known risk factors. Number of full-term pregnancies, daily caloric intake, benign breast disease, and socioeconomic status/education were the main variables responsible for this attenuation. It is interesting to notice that socioeconomic status/education, which were associated with genetic ancestry and with breast cancer risk in the univariate models, did not have a significant effect on breast cancer risk once other variables, such as number of full term pregnancies and kilocalorie intake, were included in the model.

The results of this study confirmed our previous finding in an independent sample of Mexican women living outside the United States. The effect of ancestry on breast cancer risk could be due to the effect of unmeasured environmental confounders present in Mexico and the United States. However, our results also provide support to the hypothesis of a genetic component underlying the association between genetic ancestry and breast cancer risk.

A limitation of the present study is the possibility that our measure of socioeconomic status might not be optimal. Our models included adjustment for socioeconomic status and education. However, our data included only three categories for socioeconomic status, and this may not capture the true variation in the population. The correlation between socioeconomic status and genetic ancestry in the Mexican sample is high (r = 0.23). If the true socioeconomic status were more strongly correlated with genetic ancestry, then it would be difficult to separate the effect of the two.

Another limitation of our analysis is that we treated each ancestral population, Europeans, indigenous Americans, and Africans, as one population despite the fact that numerous populations contributed to the gene pool of contemporary Latinos, and, in particular, there were multiple indigenous American populations who were genetically (20) and culturally diverse. Therefore, our results in Mexican women may not be generalizable to populations from different parts of the Americas.

In summary, we replicated our previous results, although with weaker effects, showing an association between European genetic ancestry and breast cancer risk in an independent sample of Latina women living in three different regions of Mexico. Future studies in other populations in the Americas will allow us to further explore the effect of genetic ancestry on breast cancer risk in diverse environments, populations with different breast cancer incidence, and in groups with different proportions of European, indigenous American, and African ancestry.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

. Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world. J Clin Oncol2006;24:2137–50.