Approaches used by the US Environmental Protection Agency (EPA) for testing chemicals and for establishing health reference values have been modified over the past two decades in response to a growing awareness that some chemicals interact with endocrine hormone systems. Some changes made by EPA have been mandated legislatively. For example, as discussed in Chapter 1, the 1996 Food Quality Protection Act (FQPA) requires EPA to screen pesticides and other chemicals for their potential to produce estrogenic and other endocrine effects. Passage of the FQPA led to EPA’s Endocrine Disruptor Screening Program and development of a series of in vitro and in vivo screening tests to identify chemicals that interact with the estrogen, androgen, or thyroid hormone systems (74 Fed. Reg. 54416 [2009]). Since the 1990s, EPA, the Organisation for Economic Cooperation and Development (OECD), and the National Toxicology Program (NTP) have modified toxicity-testing guidelines to improve their ability to detect effects that occur later in life after exposure to endocrine active chemicals (EACs) during sensitive windows of development. Toxicity-testing methods developed by OECD that can detect endocrine toxicity in mammals include the rodent two-generation reproduction study (TG 416), the extended one-generation reproductive toxicity study (TG 443), the rodent reproduction/developmental toxicity screening test (TG 421), the rodent chronic toxicity and oncogenicity studies (TG 451, TG 452, and TG 453), and the enhanced 28-day toxicity study (TG 407) (Bars et al. 2012). NTP has also updated its testing protocols: for example, by adding early life exposure to some cancer bioassays, adding endocrine outcomes, and extending follow-up when studying adverse effects on reproduction and development (Foster 2014).

Several aspects of EACs have prompted the need to assess the adequacy of traditional toxicity-testing strategies (see Box 2-1). Some EACs can mimic natural hormones, which affect the endocrine systems at low concentrations. Questions have been raised about how to use information about endocrine activity to understand potential health risks. For example, should a change in hormone concentrations be considered an adverse health effect? If such a change is not necessarily adverse, can it be used to predict an adverse outcome? If so, can one estimate the probability of an adverse outcome given an exposure?

EACs also have the potential to cause long-lasting developmental effects. Organisms can be especially sensitive to EACs because hormones play a critical role during normal development. Dose and timing also can dramatically influence not only the magnitude of an effect but also the type of outcome observed or, in some cases, the direction of the effect (Ankley and Villeneuve 2015). For example, the synthetic estrogen diethylstilbestrol (DES) produces tumors in different tissues depending on what dose is administered and whether the exposure occurs prenatally or neonatally (Newbold and McLachlan 1982; Newbold 2004; Newbold et al. 2004). Thus, evaluation of EAC effects throughout an organism’s lifespan is increasingly recognized as important, and some study designs that assess fertility, reproductive-tract malformations, or tumor incidence in animals have been modified. The modifications—such as dosing of pregnant animals throughout gestation, longer follow-up periods (including evaluation of several generations), assessment of multiple hormone-sensitive end points, and examination of multiple pups per litter—have improved sensitivity to detect endocrine effects (Blystone et al. 2010; Foster 2014). Awareness that EAC exposure during development can program tissues to respond differently to endogenous hormones or exogenous chemical challenges later in life (Newbold et al. 2004; Jenkins et al. 2007) or produce heritable modifications via epigenetic changes (Jefferson et al. 2013) is also growing.

Another concerning aspect of EACs is that effects from EAC exposure have been reported in humans and wildlife (Bernanke and Köhler 2009). Effects have been identified from a range of human exposures from environmental to deliberate administration of pharmaceutical agents. For example, DES administration to pregnant women provides a prime example of unexpected adverse effects, such as increased risk of breast cancer in mothers and various adverse outcomes in their offspring (Hoover et al. 2011; Reed and Fenton 2013).

One additional factor that has prompted increased scrutiny of EACs is the debate about whether thresholds exist for EAC effects. As reviewed by Hass et al. (2013), arguments in support of a threshold cite homeostatic mechanisms that are involved in endocrine regulation and the resiliency of higher order systems to adapt. Arguments against the presence of a threshold note that small fluctuations in endogenous hormones can affect regulation of a variety of biological processes (Hass et al. 2013). Thus, questions have been raised about whether the dose-selection practices used in traditional toxicity testing should be revised.

To protect public health and the environment, EPA and other agencies will need to work proactively to update testing methods as new science emerges. This chapter describes the committee’s proposed strategy to assist EPA with the tasks of developing and revising testing practices in response to expanded knowledge about the potential for low-dose effects of EACs.

The overall strategy envisioned by the committee for evaluating evidence of low-dose adverse human effects consists of three broad phases—surveillance, investigation and analysis, and actions (see Figure 2-1). The strategy recognizes that a toxicity testing program, no matter how sophisticated, cannot provide 100% assurance that all adverse effects will be identified and can be prevented. Even with pharmaceuticals, which are tested in human clinical trials, adverse effects are often not identified until after the drug has been marketed. Therefore, environmental chemicals, which are tested (if at all) in experimental systems only before use, require continued surveillance to protect public health given the expectation that false negatives will occasionally occur in testing.

Once a topic has been identified for additional investigation, the specific details of the investigation and analysis need to be planned so that they will support future agency actions. The results from the investigations and analyses are then used to select specific actions. In some cases, the only further action would be continued surveillance. In other cases where key uncertainties exist, further action could entail the generation of new data or models to address the uncertainties. If the results of an investigation suggest that adverse outcomes in humans are expected or might be occurring at low doses, the conclusions of previous toxicity testing or toxicity assessments for the chemicals that are under investigation might need to

be updated to reflect the new evidence. Additionally, such evidence might support updates to specific toxicity-testing or assessment practices to reduce the false-negative rate in the future.

The first two phases, surveillance and investigation and analysis, are described in more detail below. As noted in Figure 2-1, completion of each phase could involve one or more approaches. Although the descriptions of the components are presented sequentially in the report, there is no requirement that each approach be used or that a specific order be followed. The last phase, actions, involves policy decisions, which are outside of the committee’s charge and therefore are not discussed in detail in this report.

In the context of this report, surveillance refers to the process for detecting signals (indications that an adverse outcome in a human population or animal model might be related to exposure to an EAC at low doses) by searching, retrieving, and evaluating existing data. Surveillance also refers to the process for monitoring the literature for methods that could be used for toxicity testing of EACs. Types of data that could be considered in an active surveillance program include human, animal, or mechanistic data.

Actively Monitor for New Data

A surveillance program for identifying low-dose effects should have a process for actively monitoring for new data to help ensure that effects will be identified and analyzed on a regular basis. Three broad categories—chemical-specific data, information that could lead to modifications of toxicity-testing methods and best practices for EACs, and information on endocrine-related effects in animals and humans—should be considered in the surveillance program. Relevant information to monitor might include scientific literature, various databases, nontraditional information sources, stakeholder input, and human exposure information. Although monitoring scientific literature and databases and human exposure information might provide the most valuable information, nontraditional information sources and stakeholder input have recently been highlighted as potential sources that could lead to valuable insights. Which sources are selected for surveillance will clearly depend on the problem under consideration and the resources available to the agency. Each source is discussed in more detail in the following sections.

Monitoring the Scientific Literature

EPA has many ongoing literature-review activities for evaluating chemical hazards, such as the chemical-specific evaluations of the Office of Research and Development, the Office of Pesticide Programs, the Office of Water, and the Office of Air and Radiation. Chemical-induced effects on endocrine function might be considered in those assessments. Although the assessments typically rely heavily on animal data, human data also contribute to these assessments. Whether human studies are concordant or discordant with the animal data is an important consideration.

In addition to chemical-specific data, the scientific literature should be monitored to identify relevant end points that might need to be included in toxicity testing or risk assessment or to identify other changes to toxicity-testing practices that might be needed to improve assessment of EAC effects. For example, several assessments conducted by EPA, NTP, and international institutions have investigated various methodological issues, including addition of mammary gland assessment in regulatory guideline protocols (Makris 2011); adequacy of rodent models for detecting hormonally related cancers (Thayer and Foster 2007); study design issues for developmental reproductive toxicity (Blystone et al. 2010); and the question of biological thresholds for EACs (Haas et al. 2013). Furthermore, literature reviews of diabetes, breast cancer, and other diseases have identified biological processes and specific chemicals that appear to be involved in their etiology (Rudel et al. 2007, 2011, 2014; Macon and Fenton 2013; Schwarzman et al. 2015; Auerbach et al. 2016; Smith et al. 2016; Bruner-Tran et al. 2017), and these reviews might reveal effects that need to be integrated into toxicity-testing methods.

The study of endocrine-related human diseases, such as prostate cancer and endometriosis, might also identify adverse effects that are not readily detected in rodent studies. For example, endometriosis is a human disease that is difficult to assess in animal models because most nonprimate mammals do not menstruate and consequently do not develop ectopic lesions, which are the pathological hallmarks of endometriosis (Bellofiore et al. 2017; Bruner-Tran et al. 2017). Concerns about the possible role of chemical exposure in the development of endometriosis might therefore be difficult to assess in an animal model (see Box 2-2). Agencies that are involved in health surveillance and that have health registries (e.g., Centers for Disease Control and Prevention, National Center for Health Statistics) could also be a source of information about relevant human diseases. Thus, by reviewing endocrine-related human diseases, one could ensure that signals are not being missed because they are not recapitulated in animal models. Given that there will always be species differences in response to chemical exposure, a surveillance system should always include monitoring epidemiologic literature or research.

The committee notes that automated methods are being developed for monitoring the literature. For example, methods or tools have been developed to extract drug-safety information from the published literature and electronic medical records (Shetty and Dalal et al. 2011; Wang et al. 2011; Duke et al. 2012; Gurulingappa et al. 2012; Avillach et al. 2013; Pontes et al. 2014; Winnenburg et al. 2015), and these approaches might be relevant for monitoring literature on EACs. Other publicly available tools have been developed for supporting systematic reviews (ICASR 2015), and these might assist with searching and retrieving data related to new toxicity-testing methods, outcomes, exposure assessment, and biomonitoring.

Assessing Scientific Data

In addition to the published peer-reviewed scientific literature, data are being generated and made available to the scientific community through various other venues. For example, as described in Chapter 1, EPA’s ToxCast program has generated substantial data that can potentially provide toxicity and mechanistic information on a variety of chemicals. The testing is typically conducted in in vitro assays and uses a broad range of testing concentrations, including ones found in human biological samples; thus the testing could have relevance to environmental or low-dose exposure. The data are made available through

EPA’s website.1 Other information could come from databases developed to track mechanistic pathways for adverse outcomes.2 These types of data could potentially help identify EACs, and their review might be considered important for a surveillance program.

Tracking Nontraditional Sources of Information

Some investigators have explored whether information extracted from such informal media sources as blogs and other various forms of social media could provide useful drug-safety surveillance data (Harpaz et al. 2012; Lardon et al. 2015; Nikfarjam et al. 2015; Sarker and Gonzalez 2015). Those efforts have proven challenging for drug surveillance programs because the language used to describe medical information in social media is often informal or descriptive (Nikfarjam et al. 2015). Applying those methods to surveillance of environmental chemical exposures could prove even more challenging because, unlike with drug surveillance, people are generally unaware of their environmental exposures. Despite the limitations, approaches for evaluating social media could be explored as a means to provide additional surveillance of potential health effects associated with chemical exposure.

Obtaining Stakeholder Input

A surveillance program could also include components from stakeholder input. EPA could evaluate recommendations or policy statements made by scientific societies or organizations or other academic groups. For example, several position papers have been published on the effect of EACs on human health (Diamanti-Kandarakis et al. 2009; Skakkebaek et al. 2011; Gore et al. 2015; Bennett et al. 2016). Another means of identifying a potential public-health problem is to consider input from advocacy groups,

Although other animal models do not naturally develop endometriosis, rabbit and rodent models of endometriosis have been established by transplanting endometrium or uterine fragments from the same species (homologous models) or from humans (heterologous models) to ectopic sites (King et al. 2016). These models have not been used widely in toxicology.

Role of chemicals in endometriosis

A nonstatistically significant doubling of risk for endometriosis was reported in women exposed to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) following a factory explosion in Seveso, Italy, in 1976 (Eskenazi et al. 2002).

individual scientists, political leaders, chemical manufacturers, and other stakeholders. For example, NTP has a long-standing process that encourages nominations of chemicals of concern regarding human health (Heindel 1988). Similarly, EPA’s Integrated Risk Information System (IRIS) program provides opportunities for EPA program and regional offices and other stakeholders to nominate chemicals for consideration. Nomination processes provide opportunities for stakeholder input and can help focus attention on societal concerns. Nomination processes are not without some limitations, however, including the possibility of reporting bias by the media, community action groups, or industry. Those groups might also be influenced by social activism or corporate product defense and might not be focused on longer-term public-health issues (Mihaylov and Perkins 2015; Zoller 2017). Guidance for engaging stakeholders has been provided in other National Academies reports (e.g., NRC 2009, 2014).

Monitoring Human Exposure Information

Biomonitoring data are an important information source that can help identify whether human exposure to EACs has occurred or changed over time (NRC 2006a, 2007, 2009, 2012) and are useful in defining low-dose exposures. They can also identify potential exposure sources and demographics of highly exposed groups. A previous NRC committee provided the following recommendations regarding the need to use biomonitoring data in surveillance programs: “Develop biomonitoring-based epidemiologic, toxicologic, and exposure-assessment investigations and public-health surveillance to interpret the risks posed by low-level exposure to environmental chemicals. Where possible, enhance existing exposure assessment, epidemiologic, and toxicologic studies with biomonitoring to improve the interpretation of results of such studies” (NRC 2006a, p. 9).

Box 2-3 illustrates how biomonitoring data from the US Centers for Disease Control and Prevention (CDC) National Health and Nutrition Examination Survey (NHANES) can help inform evaluations of EACs. Occupational exposure data might also provide information that could help define the range of human exposures to some chemicals.

In the absence of biomonitoring data, external exposure data can be used to estimate the range of potential human exposure to chemicals. Active air sampling devices have been used for several years to assess personal exposure to air pollutants, such as particulates, ozone, and polycyclic aromatic hydrocarbons (Geyh et al. 1999; Perera et al. 2003; Tsai et al. 2012; Oliveira et al. 2016). More recently, passive sampling devices have been developed to eliminate the need for cumbersome equipment that decreases compliance in human studies. Passive samplers are favored for personal monitoring because they are lighter and smaller and less likely to interfere with daily activities (NRC 1991). For example, silicone wristbands have been developed as personal passive samplers capable of sequestering polycyclic aromatic hydrocarbons, flame retardants, and pesticides as measures of an individual’s external exposure (O’Connell et al. 2014; Donald et al. 2016; Hammel et al. 2016). Several funding agencies are supporting the development of sensor technologies for the 21st century, which includes wearable monitors that can be used in population studies to measure personal exposure in real time with high sensitivity and specificity and low cost.3

Another type of exposure data comes from efforts to measure chemicals in air, water, soil, and other environmental media (such as house dust), food, and consumer products, and extrapolate those measurements to human exposure. For example, EPA’s Particulate Matter (PM) Supersites Program was established to obtain atmospheric measurements to address the research questions and scientific uncertainties about PM relationships between sources, receptors, exposures, and effects (Solomon and Sioutas 2008). A similar EPA program—the Clean Air Status and Trends Network (CASTNET)—is a national monitoring network established to assess trends in pollutant concentrations, atmospheric deposition, and ecological effects due to changes in air-pollutant emissions (Puchalski et al. 2015). Other air-monitoring programs have evaluated changes in urban ozone concentrations that have occurred over several decades (Sather and Cavender 2016).

Since the National Health and Nutrition Examination Survey (NHANES) III (1988-1994) was conducted, biomonitoring has been expanded to include biomarkers of selected pesticides, phthalates, and volatile organic compounds (CDC 2009; Sobus et al. 2015). As of 2015, 265 chemical biomarkers—including ones for some brominated flame retardants, dioxins and furans, pesticides, metals, perfluorinated compounds, phthalates, and polychlorinated biphenyls—are assessed (CDC 2015). The survey also collects data on health end points and demographics and is designed to be representative of the US population.

NHANES biomonitoring data have been used to identify temporal trends in chemical exposures and can help define low-dose ranges. For example, Hartle et al. (2016) used 24-h dietary recall data and urinary samples to assess the association between consumption of canned foods and beverages and biomarkers of exposure to bisphenol A in a subset of the NHANES population that was 6 years of age and older to understand human exposure. NHANES studies can also support hypothesis generation related to health outcomes. For example, some studies have evaluated associations between urinary or blood biomarkers and hormone function, such as research that investigated links between urinary organophosphate insecticide concentrations and serum testosterone and estradiol concentrations in adult men (Omoike et al. 2015). Other studies have investigated links between serum perfluoroalkyl concentrations and serum testosterone, thyroid stimulating hormone, free and total triiodothyronine, and thyroxine levels in 12- to 80-year-old males and females (Lewis et al. 2015). Data derived from NHANES studies have also supported the development of computational dosimetry models, such as reverse toxicokinetic models, that can link chemical biomarker measurements to exposure levels (Tan et al. 2012; Sobus et al. 2015). NHANES data are also used to compare biomarker measurements to model-predicted biomarker estimates.

Biomonitoring data are not without limitations, however. NHANES biomarker data are determined for blood or urine samples that have been collected from volunteers at a single point in time. Single time-point measurement of a chemical biomarker might not accurately predict average or peak exposures (Aylward et al. 2013, 2014; Bradman et al. 2013) and might miss exposures that occur during pregnancy, fetal development, or other life stages.

The US Food and Drug Administration (FDA) has developed pharmacovigilance4 programs to monitor for adverse drug reactions (ADRs). Such programs are needed because human clinical trials use relatively small sample sizes; have shorter durations of exposure than usually occur; often lack diversity among study participants; and do not include pediatric and other susceptible subpopulations (McMahon et al. 2015). Those design features limit the ability of clinical trials to detect rare (1:1,000 to 1:10,000) ADRs that occur at therapeutic drug doses (Schotland et al. 2016).

Pharmacovigilance programs often consist of premarketing surveillance that identifies ADRs during preclinical screening and clinical trials and postmarketing surveillance in which data are accumulated throughout a drug’s market life (Ibrahim et al. 2016). Voluntary reporting systems have historically served as the primary data collection system for postmarketing pharmacovigilance. Such passive surveillance systems primarily rely on the collection of reports of suspected ADRs from health care professionals, consumers, and pharmaceutical companies. Today, FDA has established the Office of Surveillance and Epidemiology to help coordinate its efforts in postmarket drug safety surveillance (see Box 2-4).

___________________

4 Pharmacovigilance is defined by the World Health Organization (WHO) as “the science and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem” (WHO 2002, p. 7). Pharmacovigilance includes postmarketing safety surveillance activities to detect events that were not seen in a clinical trial (“safety signal generation”).

There are several important limitations of the FAERS database, including under-reporting of ADRs that results in underestimates of the prevalence of drug-ADR associations and uncertainty that a given ADR is causally related to the drug exposure (Ibrahim et al. 2016). Such limitations have led to the increased use of electronic health records as an additional source of pharmacovigilance data (Trifirò et al. 2009; Li et al. 2014). FDA has recently completed its Mini-Sentinel pilot project that developed methods, tools, resources, policies, and procedures to facilitate the use of routinely collected electronic health care data to perform active surveillance of the safety of marketed medical products (Platt et al. 2012; Gagne et al. 2016).

Some drugs have been shown to demonstrate endocrine activity (see, for example, Friedman et al. 2009). In fact, the Institute of Medicine recommended that FDA “engage the pharmaceutical industry and scientific community in postmarketing studies or clinical trials for hormonally active prescription drugs for which the potential impact on breast cancer risk has not been well characterized” (IOM 2012, p. 21). Such an approach is also relevant for other endocrine-related effects. The committee concludes that methods that have been developed for pharmacovigilance programs might be adapted or help inform an EAC surveillance program. The committee notes, however, that EAC surveillance programs that are based solely on voluntary reporting might be limited because people often cannot self-report environmental exposures.

Periodically Identify, Scope, and Prioritize Topics

Active surveillance will likely result in the identification of information on chemical exposures, outcomes, and advances in toxicity testing that will need to be reviewed. Accordingly, the next step in the committee’s strategy (see Figure 2-1) is periodic review of information and identification of topics that might need to be pursued further. That effort could involve a scoping exercise in which EPA would survey the literature and other information to determine the extent, range, and nature of information available on the topic, to identify data gaps, and to consider whether additional research might be needed. The scoping step would assist EPA in setting priorities for topics that deserve further study. Decisions to pursue a topic could be influenced by a number of factors, including the size of the population at risk, public-health significance, and available resources.

Formulate Questions to Address and Develop an Approach for the Investigation

Once a topic is selected for further analysis, the next step is to formulate the questions to address and develop an approach to the investigation, which will involve consideration of the scientific evidence, expert judgment, and relevant stakeholder perspectives. Questions and approaches are often targeted depending on the topic under investigation and potential actions that could be taken by the agency. For example, EPA might consider whether a new outcome measure should be included in regulatory toxicity tests. In that case, questions that might be posed include the following:

Is the outcome a more sensitive measure of adverse toxicity than currently used outcomes?

Is the outcome of interest a reproducible effect after chemical exposure?

Will including the new outcome in regulatory testing improve hazard identification—that is, improve sensitivity?

Box 2-5 provides a retrospective example of how the answers to these questions helped establish the measurement of anogenital distance as an outcome measure in regulatory toxicity tests.

Once the questions have been identified, the next step is to formulate the approach to the investigation and determine the types of data and analyses that are needed to answer the questions and provide the basis for agency actions. Figure 2-1 shows four types of investigation and analysis that could be considered: generation of new data or models to fill data gaps, targeted analysis of existing data, systematic review, and integration of the available evidence. The types of investigations or analyses might not be mutually exclusive because several related investigations or analyses might be needed to address the questions adequately. And, as noted above, they could be influenced by a number of factors, including the size of the population at potential risk, public-health significance, and available resources.

This example illustrates the types of questions that might be asked to address whether an outcome measure—anogenital distance (AGD)—should be included in regulatory toxicity tests. This example does not include all questions or issues that might have been considered in this particular decision.

Is the outcome an emerging concern? AGD is sexually dimorphic in many mammals; males have longer AGD than females do. Reduced AGD is considered a sensitive indicator of reduced fetal androgen (Liu et al. 2014) during the male programming window. Studies that explore chemical effects on AGD in animals date back at least 50 years (Revesz et al. 1960). Swan et al. (2005) were among the first to examine whether exposure to an EAC could result in altered AGD in humans.

Are appropriate assays available? Measurement of AGD is relatively straightforward in animals and is defined as the distance from the genital tubercle to the anus. Similar methods for the measurement of AGD in humans have been developed (Salazar-Martinez et al. 2004; Sathyanarayana et al. 2015).

Is the outcome a more sensitive measure of adverse toxicity than currently used outcomes? AGD is a more sensitive measure for reduced (fetal) androgen signaling than traditional outcomes such as hypospadias and cryptorchidism (Saillenfait et al. 2008; Kim et al. 2010).

Is the outcome of interest a reproducible effect after chemical exposure? Like other anthropometric measurements, errors in the measurement of the AGD can occur. Accurate AGD measurements depend on the identification of distinct anatomical landmarks. Measurement of AGD by a single trained examiner is preferred because inter-rater variability is often larger than intra-rater variability. A retrospective analysis of 43 multigeneration studies (16 in Wistar rats and 27 in Sprague-Dawley rats) conducted according to the latest version of the test guidelines indicated that measurement of AGD had a coefficient of variance of 25-50% (Marty et al. 2009). In humans, inter-rater and intra-rater reliability and thus reproducibility of AGD measurements can be very good if appropriate methods are followed that include standardized training and monitoring of measurements (Sathyanarayana et al. 2015).

Will including the new outcome in regulatory testing improve hazard identification? Scientific consensus was reached that AGD should be added to testing methods. For example, EPA revised OPPTS 870.3800/OECD 416 (Reproduction and Fertility Effects Test) to include AGD measurement in F2 offspring if triggered by a change in sex ratio or age at puberty onset (Marty et al. 2009). The measurement of AGD between postnatal days (PND) 0 and PND 4 was also added as a required outcome measure in OECD 421 (Reproduction/Developmental Toxicity Screening Test), OECD 422 (Combined Repeated Dose Toxicity Study with the Reproduction/Developmental Toxicity Screening Test), and OECD 443 (Extended One-Generation Reproductive Toxicity Study) tests (Beekhuijzen et al. 2016).

Targeted Analysis of Existing Data

In some cases, a targeted analysis (or reanalysis) of existing data might support agency actions. A targeted analysis might be used when different test systems, methods, analytical approaches, and other experimental design features differ between studies and make the results difficult to compare. For example, outcome measures that are analyzed as continuous variables (such as changes in mean response) in some studies are not directly comparable to outcome measures that are analyzed as dichotomous variables (such as numbers of individuals beyond a specified cut point) in other studies. Another example is the summary of dose-response data in terms of pair-wise significance. In that case, the need for additional analyses could arise when there is a finding of a no-observed-adverse-effect level (NOAEL) in one study and a lowest-observed-adverse-effect level (LOAEL) in another study at the same dose. A targeted analysis of these seemingly discordant data might be able to strengthen the interpretation of the evidence. Specifically, an analysis of those data should consider whether they are statistically significantly different from each other. Additionally, trend analyses might also be useful as an alternative to pair-wise comparisons, because trend tests might have greater statistical power. Other contextual factors—including experimental design and conduct, mechanistic data, and prior evidence—should also be considered (Goodman 2016). Consideration of those factors will help reduce the tendency of some regulators to incorrectly perceive a NOAEL as a threshold (Scholze and Kortenkamp 2007) and reduce reliance on the use of a pvalue (most often set at 0.05) as a bright line in evaluating whether an effect has occurred (Goodman 2016).

It is also common that studies from different investigators will not be performed at the same doses, and this makes it difficult to compare studies and evaluate consistency. To interpolate between dose groups, one can typically use parametric or nonparametric curve-fitting approaches. For epidemiological data, replacing pair-wise comparisons with regression analyses allows comparison of regression coefficients that can address this issue. For both toxicological and epidemiological data, the benchmark dose (BMD) approach can be well suited for comparing evidence of adverse effects among both toxicologic and epidemiologic studies. The BMD is the estimated dose associated with a specific level of response, called the benchmark response (BMR), along with its confidence interval. The use of a common BMR forces a transparent definition of the size of a biologically significant effect that is common among the studies being compared, and the resulting confidence intervals can be compared to evaluate study consistency or inconsistency. That approach has been extended to end points that are not strictly identical by using categorical regression, in which disparate end points are grouped into “bins” of severity categories (EPA 2000). The approach then estimates the BMD associated with a specific severity. Categorical regression, however, requires judgment to determine which end points and magnitudes of effect are to be grouped together at each level of severity.

There are cases in which data might be reanalyzed to better account for uncertainties. For example, in its report Health Risks from Dioxin and Related Compounds: Evaluation of the EPA Reassessment, NRC (2006b) recommended conducting a quantitative uncertainty analysis as part of EPA’s dose-response assessment. There have been numerous other examples in the literature in which data have been reanalyzed to address previously unaddressed uncertainties, particularly at low doses (see, for example, Subramaniam et al. 2007; Crump et al. 2016).

Systematic Review

Systematic reviews provide a method for evaluating evidence in a transparent and consistent manner that reduces bias. As described in several National Academies reports (NRC 2011, 2014), systematic review provides a rigorous approach to evaluating evidence that, although not removing the role of expert judgment, aims to make such judgments more transparent and less susceptible to biases. Guidelines for performing systematic reviews relevant to environmental or public-health assessments include the Navi-

A systematic review is guided by a study question that should be carefully crafted to address the problem under consideration. If the question is too broad in scope, the studies included in the analysis might be too heterogeneous for effectively integrating and drawing a conclusion. On the other hand, if it is too narrow in scope, the results are less generalizable and might not be relevant to the underlying public-health concern. The methods for identifying, screening, and analyzing the scientific literature are planned in advance to ensure that the evidence is selected and evaluated in an objective and consistent manner. The committee notes that the systematic review method alone does not lend itself to answering a question about whether a specific EAC has low-dose adverse effects. At a minimum, answering that question requires the completion of hazard identification and dose-response assessment for the effects of interest.

Generate New Data or Models

A focused research question could be addressed by generating new data or models. In some cases, that activity could involve conducting animal toxicity studies, pharmacokinetic studies, or epidemiologic investigations. At other times, new in vitro data could shed light on the mechanisms involved in an observed response. Other types of research questions might be answered through computational model development. For example, physiologically based pharmacokinetic (PBPK) or other quantitative dosimetry models for a chemical of interest could be developed to compare dose-response relationships in human studies with those in rodent studies. Quantitative dosimetry models could also help define the relationship between external dose and internal tissue concentrations and could be used to support cross-species and route-to-route extrapolations. Other types of dosimetry models might be needed to facilitate interpretation of in vitro data. For example, reverse-toxicokinetic modeling and in vitro–in vivo extrapolations are used to compare in vitro data with estimated or measured human exposure data (Wetmore et al. 2012; Yoon et al. 2012).

Questions about the coherence of findings in rodents and humans might be addressed through evaluation of mechanistic data. One way of improving our understanding of mechanistic data involves the development of conceptual models that facilitate the organization of information about biological interactions and toxicity mechanisms. A conceptual model can reflect the initial interactions of a chemical with the biological system and the resulting events that can lead to a specific adverse outcome (Ankley et al. 2010). Tools are being developed that can predict associations between key events and thus lead to the development of quantitative or computationally predicted adverse outcome pathways (qAOPs or cpAOPs) (Bell et al. 2016; Connolly et al. 2017). The committee notes that endocrine-related effects probably do not result solely from one isolated or linear pathway but involve multiple pathways or networks, and disease manifestation likely involves multiple components or stressors (NASEM 2017).

Determining the types of new data and models that are needed should be tailored to the research questions, and specific recommendations are beyond the scope of this report. Nevertheless, Box 2-6 provides an example of generating data to address uncertainty.

Integrate Available Evidence

This committee—as have other National Academies committees before it—emphasizes the need for evidence integration to be both transparent and standardized in its approach. Thus far, evidence integration has focused on the purpose of hazard identification: that is, determining whether a causal relationship exists. Causal frameworks, such as those developed by the International Agency for Research on Cancer, NTP, and EPA, can be adopted or adapted to provide transparency and consistency in conducting causal evaluations. See previous National Academies reports for additional guidance on such approaches (NRC 2014; NASEM 2015, 2017).

This example illustrates how uncertainty or data gaps can be addressed by generating new data and models.

Data gap: Uncertainties in the human pharmacokinetics of bisphenol A (BPA) after oral exposure. Specifically, some biomonitoring studies have reported serum concentrations of unconjugated BPA in the 1-10 nM range, which is similar to the range where some in vitro and in vivo studies have reported significant biological effects (Vandenberg et al. 2013).

Approach: To address this uncertainty, the National Institutes of Health conducted a human pharmacokinetic study using a single oral administration of deuterated BPA, which can be distinguished from background, and more sensitive analytical methods (LOD <10 pM) (Thayer et al. 2015). The study concluded that unconjugated BPA comprised less than 1% of total BPA in the serum, with elimination largely complete 24 hours after oral administration. Using those data with a new pharmacokinetic model (Yang et al. 2015) suggested that peak serum concentrations in the general population are likely to be about 5-20 pM for daily dietary intakes of up to 0.5 µg/kg-day, a range that is consistent with estimates based on other methods (Teeguarden et al. 2013, 2015).

Impact: Such concentrations are well below those studied in most in vitro and in vivo experimental studies, so most studies reporting “low dose” effects of BPA do not directly inform whether BPA can cause effects at current human exposure levels.

The committee recognizes, however, that for addressing low-dose adverse effects of chemical exposure, the question will often be more explicitly quantitative: that is, it specifically concerns the nature of the dose-response relationship at low doses. Although the evidence of causality is still important, the problem is that the causal evidence might include studies that exclusively include high exposures, such as experimental doses near the maximum tolerated dose. High-dose data alone are usually not useful for making inferences about response to exposures at low doses because of uncertainties in the shape of dose-response curves below the range of observation.

Therefore, it might be necessary to integrate the subset of evidence that includes low-dose toxicity data separately. Although not very informative for causal inferences, environmental exposure data, such as biomonitoring data, might nevertheless be useful for defining what subset of the data can be considered as low dose (see discussion earlier in this chapter). Some additional considerations include the following:

In vitro–in vivo extrapolation or reverse toxicokinetics can help to determine what in vitro mechanistic data could be considered low-dose data.

Addressing toxicokinetic and toxicodynamic differences between species and populations.

Biological plausibility given mechanistic data.

Co-exposures that might act on the same end point.

Data integration can also be used to consider questions that are not about specific EACs but are broader, such as whether a new end point or new exposure or assessment window is relevant to determining low-dose effects. As noted earlier, some end points have been added to regulatory testing protocols in response to growing evidence that they are indicators of toxicity, and the duration of some tests has been extended to capture effects that might occur later in life. Signals identified during the surveillance step that have those types of implications about toxicity testing could be evaluated by integrating the available evidence. One example of such a signal is the growing concern about evaluations of mammary gland toxicity. Makris (2011) evaluated how the effects of environmental chemicals on the mammary gland are

assessed in guideline studies of EPA, OECD, and NTP and made a number of recommendations for enhancing how the end point is assessed. She identified data gaps, issues, and challenges and noted that “to address these issues, a paradigm shift would be needed for the evaluation of [mammary gland] in guideline studies” (Makris 2011, p. 1050). Challenges identified with implementing such a shift were issues of species and strain sensitivity, the timing of exposure, and when the end point is evaluated. Evidence integration could help address such issues.

The remaining step in the committee’s strategy is to select the types of actions that are needed. As shown in Figure 2-1, several types of actions could result, including the need to update chemical assessments, to continue to monitor for new data, to require new data or models to reduce uncertainties, or to update toxicity-testing designs and practices. The type of actions that EPA takes could be influenced by a number of factors, including the size of the population at risk, the public-health significance of the investigation, and available resources. Specific recommendations on exactly what actions to take are beyond the scope of this report.

To ensure adequate understanding of hazards and to inform regulatory decision making, EPA needs a general strategy for ongoing evaluation of evidence of low-dose effects from exposure to EACs. The committee proposes a strategy involving three phases: surveillance, investigation and analysis, and actions. EPA is already conducting many activities consistent with the proposed strategy, though not necessarily in the specific context of assessing low-dose exposure to EACs.

Recommendation: EPA should develop an active surveillance program focused specifically on low-dose exposures to EACs. This program could include regularly monitoring published research and other information sources, gathering input from stakeholders, and collecting human exposure information. It might also involve data collection in collaboration with other agencies and outside parties. The surveillance program should periodically identify, scope, and prioritize potential areas of focus related to low-dose effects, such as particular chemicals and end points. Some approaches discussed in this chapter will require methods and tool development, such as automated methods for monitoring the literature.

Recommendation: After a topic is selected for further evaluation, the agency should plan its investigation by identifying key questions to be addressed and determining the types of data and analyses needed to answer the questions and to support future agency actions.

The four main approaches for investigation and analysis are targeted analysis of existing data, systematic review, generation of new data or models, and integration of evidence. The types of analyses used to investigate the questions are not mutually exclusive, and several approaches might be needed to address the questions adequately. Integration of evidence for low-dose adverse human effects of EACs involves consideration of both hazard identification and dose response.

Recommendation: Environmental exposure data should be used, if available, to define what subset of the data should be considered as low dose.

A robust strategy will provide the agency with a range of options to address questions of concern.

Recommendation: The specific approaches and tools used to implement the strategy to address issues related to low-dose endocrine effects will need to be considered on a case-by-case basis and should be guided by the specific questions under study.

The proposed strategy in this chapter will facilitate more regular consideration of the adequacy of toxicity testing. However, the agency will also be faced with questions about the amount of evidence needed to change traditional test methods, and these questions might be more appropriately addressed through policy decisions.

CDC (Centers for Disease Control and Prevention). 2009. Fourth National Report on Human Exposure to Environmental Chemicals. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, Atlanta, GA [online]. Available: https://www.cdc.gov/exposurereport/pdf/fourthreport.pdf/ [accessed July 21, 2016].

Macon, M.B., and S.E. Fenton. 2013. Endocrine disruptors and the breast: Early life effects and later life disease. J. Mammary Gland Biol. Neoplasia 18(1):43-61.

Makris, S.L. 2011. Current assessment of the effects of environmental chemicals on the mammary gland in guideline rodent studies by the U.S. Environmental Protection Agency (U.S. EPA), Organisation for Economic Co-operation and Development (OECD), and National Toxicology Program (NTP). Environ. Health Perspect. 119(8):1047-1052.

To safeguard public health, the US Environmental Protection Agency (EPA) must keep abreast of new scientific information and emerging technologies so that it can apply them to regulatory decision-making. For decades the agency has dealt with questions about what animal-testing data to use to make predictions about human health hazards, how to perform dose-response extrapolations, how to identify and protect susceptible subpopulations, and how to address uncertainties. As alternatives to traditional toxicity testing have emerged, the agency has been faced with additional questions about how to incorporate data from such tests into its chemical assessments and whether such tests can replace some traditional testing methods.

Endocrine active chemicals (EACs) have raised concerns that traditional toxicity-testing protocols might be inadequate to identify all potential hazards to human health because they have the ability to modulate normal hormone function, and small alterations in hormone concentrations, particularly during sensitive life stages, can have lasting and significant effects. To address concerns about potential human health effects from EACs at low doses, this report develops a strategy to evaluate the evidence for such low-dose effects.

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.