Abstract

Use of race and ethnicity terms in genetic research continues to generate controversy. Despite differing opinions about their basis or relevance, there is some agreement that investigators using these terms should: explain why the terms or categories were used, define them carefully, and apply them consistently. An important question is whether these recommendations are reflected in practice. Here we addressed this question based on 330 randomly selected articles published between 2001 and 2004 that reported on genetic research and used one or more words from a defined list of race, ethnicity, or population terms. The recommendation that authors using race or ethnicity terms explain the basis for assigning them to study populations was met infrequently (9.1%), and articles that used race and ethnicity as variables were no more likely than those that used them only to label a sample to provide these details. No article defined or discussed the concepts of race or ethnicity. With limited exceptions, current practice does not reflect repeated recommendations for using race or ethnicity terms in genetic research. This study provides a baseline against which to measure future trends.

Concerns about the consequences of leaving these particular terms ambiguous in genetic research extend beyond the standard arguments for precision in science. Using a term without defining it can suggest that its meaning is self-evident and therefore requires no definition. Readers, when faced with a passage that uses a race or ethnicity term without explaining its meaning, will not draw a blank as they might if confronted with an obscure technical phrase. Instead, because race and to a lesser extent ethnicity have widely used popular meanings, readers are likely to supply their own definitions. Research has shown that these definitions are often informed by unexamined stereotypes [LaViest, 1996; Bhopal, 2002; Cooper et al., 2003], that are typically wrong [Terracciano et al., 2005], and rarely positive [Blair et al., 2004; Burgess et al., 2004; Maddox, 2004]. Research has also shown that the unconscious reliance on such stereotypes by health care practitioners may contribute to racial and ethnic disparities in medical treatment [Rathore et al., 2000; van Ryn, 2002; Geiger, 2003; Sankar, 2007].

Terms that represent common, recognizable categories need little explanation; by not explaining what they mean by terms, authors reinforce the impression that race or ethnicity categories can be consistently distinguished and readily assigned. The impression that race and ethnicity categories are self-evident can be especially troublesome in the context of genetic research because it can “reinforce the simplistic view that race/ethnicity is the sole and/or direct cause” [Kaplan and Bennett, 2003] of a condition, especially those associated with health disparities [Braun, 2002; Kaplan and Bennett, 2003]. Failure to specify a causal mechanism when reporting findings that link health outcomes to racial or ethnic identity can exacerbate this problem [Anonymous, 2001; Kaplan and Bennett, 2003; Goldstein and Hirschhorn, 2004].

These suggestions are potentially valuable, but warrant scrutiny. First, their basis is often unclear. They rely not on systematic analysis of how authors actually use race or ethnicity terms but on anecdotal evidence and on what commentators consider the logical corollaries of using these terms in genetics research. Second, they are very similar to suggestions made several times over the past 50 years [UNESCO, 1952; Littlefield et al., 1999], but never evaluated. Before embracing this round of suggestions it would be useful to know how past ones have fared. Systematic studies of how race and ethnicity terms have been used in genetic research however are scarce. In order to assess the nature and degree of the problem and the effectiveness of proposals to address it, such inquiries are necessary.

With this need in mind, we conducted a study to examine how race and ethnicity terms are used in publications on genetic research. Specifically, we sought to assess the extent that proposals to improve clarity accurately reflect and address the current state of research and to provide a baseline against which to judge the effect of proposals yet to come.

We describe how articles in the biomedical research literature use race and ethnicity, and test hypotheses that articles that use race and ethnicity terms justify or explain the use. These findings represent the first stage in a three-part study that examines practices and trends in genetics research literature. The next two stages focus on patterns within more narrowly defined subcategories of genetic research and on a detailed analysis of the function and meaning of race and ethnicity terms within, rather than across, articles.

MATERIALS AND METHODS

In this analysis we examine how scientific articles describing genetic research define and use race and ethnicity terms. We avoid defining race or ethnicity, and do not seek to propose such definitions. Furthermore, we do not distinguish conceptually between race and ethnicity. We use both terms together as a phrase, as in “race and ethnicity,” unless specifically discussing one of them. Although many have argued for replacing race with ethnicity [Harrison, 1995; Wilson, 2000; Kalow, 2001; Schwartz, 2001; Wood, 2001] and have provided distinct definitions of the two terms, proponents themselves have not agreed on these definitions. In practice, the definitions and the conceptual distinctions they are intended to embody are not upheld, and the two terms are often used as synonyms.

Sample Selection

We obtained samples of journal articles indexed in MEDLINE by searching on the basis of race and ethnicity terms, population terms, and genetics terms (details below). We based the search on three sets of journals. The first two samples were drawn from journals with the highest impact factors (based on ISI journal citation reports [Thomson ISI, 2005b]) in the fields of (1) clinical research (which included a subset of cardiology journals) and (2) genetics. Many high impact factor genetics journals do not address the human population research of interest here. As a result, and in order to keep the impact factor ratings roughly equivalent in the clinical and genetics journals, the high impact genetics sample is based on only two journals while the high impact clinical sample draws on five (as well as the cardiology journals). Table I lists the journals selected by impact factor. The third sample, referred to here as the general journal sample, was drawn from MEDLINE-indexed journals, excluding all of those journals that were included in the high impact sample.

We relied on journal impact factors to choose our sample for three reasons. First, impact factor as the “measure of the frequency with which the ‘average article’ in a journal has been cited in a particular year or period” [Thomson ISI, 2005a], suggests that articles from high impact factor journals are more likely to be read by more people. Thus analyzing articles from these journals provides an account of the most common models available to researchers who use genetic research findings but who may themselves not have conducted or been experts in this kind of research. Second, as impact factor rankings also approximate a journal’s prestige [Thomson ISI, 2005a], and as prestige often translates into greater resources, journals with high impact factors are more likely to have paid editorial staff and the capacity to develop and apply strict editorial and peer review policies. This capacity suggests that the practices identified in these journals might be more likely to reflect preferred practices. Third, articles in high-impact journals are the most likely to be reported in the lay press, and thus clarity in the use and definitions of socially-charged terminology is important to assess in these journals.

We added a specific sub-sample from high impact factor cardiology journals to the clinical sample to assure inclusion of a sufficient number of in-depth, condition-specific studies in the clinical sample. For analysis purposes, the cardiology sample was combined with the clinical sample.

To collect study articles we conducted a four step sampling process, applied first to high impact journal citations. We began step one by searching MEDLINE in the selected high impact factor journals with the following limits: abstract, humans, English, and publication dates between 2001 and 2004. To restrict articles to those that reported research, we also set search criteria to exclude articles indexed to the following article types: clinical conference, comment, consensus development conference NIH, duplicate publication, editorial, letter, news, newspaper article, review, academic, review, multicase, and review literature. Full citations for articles that met these criteria were then downloaded.

Step two started with searching the title, abstract and keywords of downloaded citations for terms from the following three lists: (1) Race and ethnicity terms, including race, racial, ethnic, ethnicity, and a set of race and ethnicity terms based on pre-2003 MEDLINE MeSH terms. These terms included: black, white, Caucasian, European-American, Asian, Hispanic American, Mexican American, Native American, American Indian, Alaskan American, African American, Inuit, Gypsy (or Gypsies) Arab, and Jew; (2) Population terms, including population, family, and kindred; and, (3) Genetic terms, including genetic (MeSH term “genetic” returns the following list of terms: cytogenetics, genetic research, genetics behavioral, genetics medical, genetics microbial, genetics population, immunogenetics, molecular biology, pharmacogenetics, and radiation genetics), and pharmacogenomics. It is noteworthy that MEDLINE has revised its MeSH terminology for the list 1 terms concerning race and ethnicity [Sankar, 2003]. However, list 1 terms were accepted MeSH terms during most of the period covered by our sample and their use by MEDLINE suggests a sufficient overlap with popular usage to warrant adoption here.

To be eligible for random selection for the study, an article had to have in its title, abstract, or keywords one term from list 1 (race or ethnicity terms) or one term from list 2 (population terms), and one term from list 3 (genetics terms). Review of article titles, abstract, and keywords based on these criteria returned 2,151 articles from high impact clinical journals, 178 from cardiology journals, and 1,295 from genetics journals.

For step three, we randomly selected 120 articles from the clinical set, 120 from the genetic set, and 45 from the cardiology set.

Step four subjected each of these articles to detailed review to assure that the article: (1) reported on original research; (2) used human subjects or human tissue samples; and, (3) concerned human genetics (and not bacterial or viral DNA). If an article did not meet one of the three criteria, it was excluded from the sample. Articles were also evaluated to assure that common words, such as “white” or “race” were used to describe the race or ethnicity of subjects. Articles that used these terms for different purposes or as part of another word were eliminated. (e.g., “white wine” [Mukamal et al., 2003], or “Arabidopsis” [Housworth and Stahl, 2003].) After this review, 100 articles remained in the clinical set, 31 from cardiology, and 102 in the genetics set. The cardiology articles were merged into the clinical set at this point resulting in a clinical sample of 131 articles.

To create the general article sample, we followed the same four steps with one difference. We ran the same MeSH and MeSH-based term searches, and selected articles based on the same combinations: a race or ethnicity term and a genetic term, or a population term and a genetic term. However, from this set we then eliminated all articles from journals we had included in the high impact sample. Searches on the remaining articles returned 1,575 selections from which we then randomly selected 120 articles. Step four reduced this sample for the general article set to 97. Added to the high impact clinical and high impact genetics articles, these additional 97 articles resulted in a total sample of 330 articles for analysis.

Coding and Inter-Rater Reliability

Once each sample was finalized, full text PDFs were downloaded for each article, then converted to Rich Text Format documents and imported into Atlas.ti version 5 [Muhr, 2004], a qualitative analysis software package. Content codes were developed to reflect how the research population was discussed, and the structure and main components of the article. The initial set of codes was defined and then tested by a team of four researchers, including PS and MKC. These codes were subjected to multiple evaluations including several rounds of consensus coding [Jenkins et al., 2005] and discussion, and review by two different advisory groups that included science journal editors and clinical and research geneticists. When the codes were judged to adequately capture the relevant article features, a codebook was created containing coding rules, definitions, and examples.

Three coders were trained using articles that had been coded as part of codebook development. Coder reliability was assessed on each article and training continued on additional previously coded articles until trainee coding largely matched approved coding. To evaluate inter-rater-reliability, these coders then coded a sample of 72 new articles. Each article was coded by a pair of coders and assignments were made such that each pair coded approximately one-half of the articles. Coding was then compared between coders and an agreement/ disagreement ratio was calculated. The inter-rater-reliability score on these 72 articles was 90.6% agreement. Subsequently, each article was coded by one coder. 10% of the remaining sample was subjected to the same inter-rater reliability test and agreement remained >90%.

Coding

Analysis codes are divided into 3 categories: (1) basic article features; (2) reasons researchers gave for how and why they used particular populations; and (3) the role of race or ethnicity in the research design or account of the research.

Basic features

Each article was classified on the basis of 3 basic features providing essential information about the study it reported. Each code was applied as a dichotomous variable. (1) Hypothesis was defined as the presence of a founding idea or assumption stated as the basis for investigation. Text identified for this code included formally stated hypotheses and more general research questions. In either case, the text had to state or imply that the idea provided the basis for the study; (2) Limitations were statements that explained the factors that restricted the generalizability of study findings. To be coded as limitations, statements had to be explicit and related to study design. Hypothesis and limitations are standard features of scientific research articles that impose a structure on argument and inference [Gambaro et al., 2000; Ioannadis et al., 2004]. Inclusion of a specific hypothesis is important because this is where readers might expect to find an explanation of how the choice to characterize study populations as members of particular racial or ethnic groups relates to the study premise or hypothesis. A limitations statement provides the opportunity to explain how widely findings can be applied to populations beyond the study sample that might be associated with the race and ethnicity terms used in the study; and, (3) We defined sample origin not in geographic terms, but as where and how researchers acquired the tissue samples, for example, if they were already in researchers’ possession, collected at a clinic, or obtained from a tissue bank.

The role of race or ethnicity

To examine different ways that articles used race or ethnicity, we labeled text that referred to the following 7 codes: (1) Reference population was applied to text where race or ethnicity terms were used to name populations that were discussed in the article but not studied as part of the reported research. These statements typically occurred in the introduction or discussion sections; (2) Sample population was applied to text where race or ethnicity terms were used to label the study population; (3) Dependent and; (4) Independent were applied respectively to text that characterized race or ethnicity as dependent or independent variables in the research being reported; (5) DNA with label indicated where authors had labeled DNA, alleles, chromosomes, or mutations with a race or ethnicity term, as in “For the three variants combined, the difference between African and white chromosomes is significant” [Thye et al., 2003]. The 1st and 2nd codes, reference population and sample, and the 5th, DNA with label, could co-occur with the 3rd and 4th, but the 3rd and 4th were mutually exclusive; (6) Defines race or ethnicity. If an article had text coded to any of the first 5 codes, coders assessed whether to apply the code, defines race or ethnicity. This code was applied to text that defined race or ethnicity as a genetic, social, or biological concept; and, (7) No race term. By default, all articles that did not receive one of these 6 codes were classified No race term. These articles were found to contain one of the population terms used to select the sample (e.g., population, family, kindred).

Some articles referred readers to previously published articles for details about research procedures or the sample population. Such statements, if relevant to study codes, were double-coded, first to the primary code to which the statement referred, such as sample origin or basis for assigning population label, and then to a second code, previously described. After coding was complete, coders reviewed articles that displayed passages coded to previously described, noted the references cited as containing the relevant information, and then located and downloaded these articles. The statements in referenced articles were then used as the basis for assigning codes to the original article.

Statistical Analyses

We subdivided the data in two ways: according to journal category (clinical, genetic, and general), and temporally (first 2 years vs. last 2 years). We then compared frequencies of individual codes across the resulting subsets of the data, assessing the significance of any differences via the chi-square statistic. Statistical tests were performed using STATA statistical software (Version 7.0 for MAC).

RESULTS

We reviewed and coded 330 articles describing genetics research in the context of human groups and found that slightly more than half used a race or ethnicity term as a variable (independent variable + dependent variable) and the rest divided almost evenly into those that did not use such a term at all for the reported study (no term + reference population only) and those that used it only to name the study sample and not as a variable (sample population only) (Table II). We compared articles from the first 2 years and the second 2 years and found no significant difference (P=0.664).

Basic Article Features

Most articles did not have clearly identifiable hypotheses or limitations. Coders were able to identify a hypothesis (passages that described an idea or premise as the basis for the investigation) in 30% (n=99) of articles. Many articles in the sample followed a style of inquiry that can best be described as deductive rather than inductive in that the justification for the research seemed to emerge from its findings rather than from the question it posed. Other articles simply reported interesting findings with few details about why the research had been conducted. Neither type of article was coded to hypothesis. The number of articles providing a limitations statement was similarly low, 22.7% (n=75). Statements noting the origin of study samples was reported in 62.4% of the articles (n=206).

Reason for Using Populations

The majority of articles did not justify the use of a labeled population or explain the basis of a population label. Just over 10% (n=36) of articles explained why research had been conducted using a labeled population, while slightly more explained precisely why a particular population had been chosen for study. Explaining the basis for how a label had been assigned to a population was identified in only 9.1% of articles (n=30). This last finding is particularly important because this feature most closely approximates what some guidelines have outlined as basic procedures for describing the identity of populations used in genetic research and might be considered a basic, easily fulfilled requirement.

Those authors who did explain the basis for assigning a label were likely also to explain why the research question they were pursuing was best studied using a population identified by a race or ethnicity term (P<0.0005). This group was also more likely to explain why the particular population they used was chosen, although the relationship was non-significant (P=0.110). The weak relationship between “explaining the basis for assigning a label” and “explaining why a particular population was chosen” may reflect the fact that in some studies the choice of a particular sample was driven more by convenience (access to samples) than by theory.

Ways of Using Race and Ethnicity

Approximately half, 51.5%, of the articles used race or ethnicity terms as either a dependent or independent variable (n=170). Another quarter either did not use race or ethnicity terms at all or used them only to refer to populations discussed in related research rather than to label their own study populations (n=84). The remaining articles (23%, n=76) used a race or ethnicity term to label members of the study population, but did not report results using this term nor test any hypothesis related to it. Just over 10% of the articles used phrases coded as DNA with label such as “Puerto Rican mutations” [Mykytyn et al., 2003], “African and Asian haplotypes” [Herrnstadt et al., 2002], or more general phrases such as an “allele’s ethnic identity” [Collins-Schramm et al., 2002]. Most of these passages (22 of 35) appear in genetics journals.

No article in this sample defined race or ethnicity. Based on preliminary research we had anticipated that few articles would provide such a definition, but deemed it important to track nonetheless. We had not anticipated that no article would offer a definition. However, this finding is consistent with the low frequency of articles coded to basis for assigning population label in the sense that if a researcher neglects explaining how a population label was assigned, he or she is unlikely to define or discuss related concepts of race and ethnicity.

Basic Article Features Associated With use of Race and Ethnicity Terms

If authors who used race or ethnicity terms as variables in their studies followed published recommendations, they would be more likely than authors who did not to clearly state study hypotheses and study limitations, and to explain the origin of study samples. To test these relationships we created a new use of race or ethnicity term variable by combining no use of terms (no term) with use of terms only to refer to a population other than the study population (reference population only) and by combining use of term as a dependent variable and use of term as an independent variable. Use of term to describe study sample (sample population only) remained unchanged.

Table III reports comparisons of the resulting three-category variable and the four basic article features, and shows that articles that use race or ethnicity terms as variables are no more likely to state the study’s hypothesis or limitations than those articles that use race or ethnicity terms only to label a sample or not at all. Articles that used race or ethnicity as variables differed significantly from other articles in the greater likelihood of explaining the origin of study samples (P=0.004).

Presence of Coded Article Features by Ways of Using Race or Ethnicity Terms

A second set of hypotheses was relevant only to those articles that used a race or ethnicity term either to label a study population (n=76) or as a variable (n=170). Based on recommendations about such use, we tested the hypothesis that articles that used race or ethnicity as a variable would be more likely than articles that used the terms only to label a sample to explain why they had chosen to conduct their studies using racially or ethnically identified populations, to explain why they had chosen the particular populations to study that they did, and to explain the basis on which they had assigned race or ethnicity terms to samples or subjects. We also tested the hypothesis that they would be less likely to use phrases coded to DNA with label.

Results in Table IV show that the articles that used race and ethnicity as variables were no more likely than those that used the terms only to label a sample to explain why they chose to study particular populations or to provide the basis on which they assigned race and ethnicity labels to subjects. However, these articles were more likely to use phrases coded to DNA with label (P<0.0001).

Articles That Use Race or Ethnicity Terms to Name a Study Sample Compared to Articles that Use Such Terms as Dependent or Independent Variables

Differences Across Journal Types (Clinical or Genetic) and by Journal Impact Factor

To explore possible differences associated with the type of journal in which an article appeared, we looked only at articles from high impact factor journals and compared those from clinical journals to those from genetics journals. These analyses were conducted only on those articles that used race or ethnicity terms and are reported in Table V.

Variation Between Types of Journals and Between Articles From High Impact Factor Versus Non-High Impact Factor Journals

Findings suggest that articles published in clinical journals differ from those published in genetic journals on several measures. Clinical journal articles are significantly more likely to report study limitations and sample origins, while articles in genetics journals are significantly more likely to use phrases coded to DNA with label. The two sets of journals do not differ on any of the measures associated with providing reasons for using a populations labeled with race or ethnicity terms.

To look for possible differences associated with journal impact factor, we combined articles from the clinical and genetics high impact factor journals and contrasted them to the non-high impact journal articles (i.e., the general article sample). This analysis eliminated the differences that were apparent between the clinical and the genetics journal when only high impact factor journal articles were analyzed. The high impact and non-high impact journal sets do not differ significantly on any measure.

DISCUSSION

Our inquiry sought to contribute to the debate about the use of race and ethnicity in genetic research by examining actual research practices, as reflected in published studies.

The focus on research practices reflects the opinions expressed in commentaries concerning this issue that suggested that improved study design and more accurate reporting of study design and results might resolve at least some of the controversy concerning the use of race and ethnicity terms in genetic research.

While there are several excellent studies that examine how other fields use race and ethnicity [Hahn, 1992; Cooper, 1994; Senior and Bhopal, 1994; Hahn et al., 1996; Drevdahl et al., 2001; Thomas, 2002], this is the first study to our knowledge to quantify the use of race or ethnicity as variables in genetic research and to document that, although use is widespread, explanation of use is generally rare. In our sample, no article that used race or ethnicity as a general term or concept defined them and fewer than 10% explained the basis for population terms that they assigned to subjects, an easily fulfilled requirement providing valuable information. Only a minority included clear hypotheses and limitation statements and few explained how specifying the racial or ethnic identity of the study population helped to answer the research question at hand.

These findings lend credence to concerns that genetics researchers neither address what they mean by race and ethnicity terms nor explain the relevance of race or ethnicity to their research. Precisely why authors persist in practices that do not reflect existing recommendations for using race and ethnicity terms in genetics research is unclear. It may be that some disagree with the recommendations while others lack information about them. Researchers using data sets that were created and labeled before such issues became prominent might hesitate to change assigned labels. Alternatively, reasons might lie with journal editors who either do not endorse such policies or do not have the resources to enforce them. These are questions in need of study.

Whether using race and ethnicity terms without explaining them actually contributes to unintended social consequences such as stereotyping or reinforcing incorrect understandings of race as a natural human feature, as some have postulated [Anonymous, 2001; Cooper et al., 2003; Kahn, 2003], is unknown and is itself a complex research question also in need of study. Nonetheless, even if these practices are found not to contribute to unintended social consequences, they warrant attention. For example, as others have noted, failing to provide an accurate explanation of how labels are assigned to subjects impedes constructive use of study findings [Hirschhorn et al., 2002] and, shorthand phrases such as “Puerto Rican mutations” [Mykytyn et al., 2003] or an “allele’s ethnic identity” [Collins-Schramm et al., 2002] are likely to confuse rather than clarify putative relationships between genetic variation and geographical origins.

We note a few limitations of this study. First, our journal sample was not representative of all clinical or genetic journals. However, we selected those with the highest impact factors in order to examine articles that presumably would be the most influential in their respective fields, serve as models for the research community, and have the most exposure outside the research community. Second, our analysis did not distinguish between different types of genetic research, where race or ethnicity might be used in different ways. Creating mutually exclusive categories for this variable that coders could reliably apply was not possible because of the heterogeneity of research and the uncommon but not rare approach to some research problems that led authors to report on more than one type of research (more or less as the stages of a study) in a single article. Such a variable might have accounted for some of the differences we reported here and attributed to other factors.

Despite these limitations, this study provides important and useful information about how investigators are using race and ethnicity terms in conducting and reporting genetic research. Our results suggest that reason for concern about this use persists and that, with the partial exception of clinical journals with high impact factors, inadequate explanation of the meaning and purpose of race and ethnicity is widespread across journals. Authors have suggested that to increase the likelihood that genetic research that uses race and ethnicity terms can contribute fully and positively to future research and clinical practice, articles need to provide complete explanations about their reasons for using these terms and their methods of applying them. Authors, peer reviewers, and journal editors can play an important role in encouraging changes in practice. Our analysis codes could be used as a basis for assessing the reporting and use of race and ethnicity in the biomedical literature.

Acknowledgments

This work was supported by a grant from the U.S. Public Health Service, R01 HG003191. The funding source played no role in study design, data collection, analysis, or interpretation, nor in the writing of the manuscript and the decision to submit this paper for publication.

LaViest T. Why we should continue to study race, but do a better job: An essay on race, racism and health. Ethn Dis. 1996;6:21–29.[PubMed]

Lee SS-J. “Incidental findings” of race in pharmacogenomics and the infrastructure for finding differences in biomedical research. In: Singer E, Antonucci TC, editors. Proceedings of the Conference on Health Disparities. Ann Arbor, MI: Survey Research Center, Institute for Social Research, University of Michigan; 2004. pp. 81–94.

Littlefield A, Lieberman L, Reynolds LT. The debate over race: Thirty years and two centuries later; Part Two: Thirty years after the debate over race: The passing of the great consensus. In: Montagu A, editor. Race and IQ. New York: Oxford University Press; 1999. pp. 70–84.