Abstract

Whether children, especially girls, are entering and progressing through puberty earlier today than in the mid-1900s has been debated. Secular trend analysis, based on available data, is limited by data comparability among studies in different popu-lations, in different periods of time, and using different methods. As a result, conclusions from data comparisons have not been consistent. An expert panel was asked to evaluate the weight of evidence for whether the data, collected from 1940 to 1994, are sufficient to suggest or establish a secular trend in the timing of puberty markers in US boys or girls. A majority of the panelists agreed that data are sufficient to suggest a trend toward an earlier breast development onset and menarche in girls but not for other female pubertal markers. A minority of panelists concluded that the current data on girls' puberty timing for any marker are insufficient. Almost all panelists concluded, on the basis of few studies and reliability issues of some male puberty markers, that current data for boys are insufficient to evaluate secular trends in male pubertal development. The panel agreed that altered puberty timing should be considered an adverse effect, although the magnitude of change considered adverse was not assessed. The panel recommended (1) additional analyses of existing puberty-timing data to examine secular trends and trends in the temporal sequence of pubertal events;(2) the development of biomarkers for pubertal timing and methods to discriminate fat versus breast tissue, and (3) establishment of cohorts to examine pubertal markers longitudinally within the same individuals.

Changing trends in the timing of pubertal development may identify public health concerns. Before the mid-1900s, secular trend analyses of age at female

puberty relied on studies of age at menarche among specific populations, and these studies suggest that the average age of menarche declined in the United States between the late 1800s and the mid-1900s.1,2 This decline was attributed to improve-ments in general health, nutrition, and other living conditions during this time frame. From the mid-1900s to the present, some studies have reported that the age at menarche and breast and pubic hair development have declined in US girls3-5; however, individual study methods and study comparisons have been challenged, and different studies have drawn different conclusions, leading to an unresolved debate regarding a secular trend in female puberty timing during this time frame. Although there are fewer published studies of puberty timing for US boys, some studies have reported an earlier puberty onset in boys as well.6,7

An expert panel was asked to resolve the debate by performing a thorough weight-of-the-evidence evaluation of the data during and after “The Role of Environmental Factors on the Onset and Progression of Puberty” Workshop (see introduction article8 for details). This is the first effort, to our knowledge, to develop consensus on whether there has been a secular trend in the timing of puberty in the

United States from the mid-1900s to the present. This article presents the panel conclusions on the question, “Are data sufficient to suggest or establish a secular trend in the timing of puberty onset and/or progression (as measured by puberty markers) in boys and girls from US data over the time period 1940 to 1994.” This time frame was selected because (1) it is at the center of the debate about a secular trend in puberty timing and (2) many studies that were performed during this interval assessed breast and pubic hair staging as well as menarche.

Puberty-Timing Markers

Human puberty is a complex temporal sequence of biological events leading to the maturation of secondary sex characteristics, accelerated linear growth, and attainment of reproductive capacity. During normal puberty, the sequence of events requires gonadal and adrenal activation. Breast development, pubic hair development, and age at menarche have been most frequently assessed in female puberty-timing studies in humans. The male puberty markers included in this discussion are genital and pubic hair development. Breast, genital, and pubic hair development have been assessed using a progressive scale, usually the Tanner staging. Tanner staging is based on a scale of 5 progressive stages described and depicted in photographs by Marshall and Tanner in which stage 1 is prepubertal, stage 2 is onset, and stage 5 is adult.9-12 Earlier studies that assessed breast and pubic hair staging used similar definitions.13,14 They are based on visual observation alone, either from photographs or in person by a trained professional or a child (ie, self-assessment).

The first sign of puberty in girls is either the initiation of breast development, designated as Tanner stage 2 (B2; also called breast budding), or pubic hair development (PH2).10,15 Their relative order of appearance is subject to interindividual and interracial variability.5,16 Black girls often begin pubic hair development before or close to the time of breast development onset, whereas white girls typically begin breast development before pubic hair development.5 B2 can be difficult to assess using visual inspection alone, either in person or from photographs. Discerning breast from fat tissue is a key concern, especially in overweight girls with excessive subcutaneous fat in the chest. Palpation under the areolar area by a trained professional can distinguish breast (glandular) from fat tissue, improving data quality, and is used in clinical diagnosis. The Lee17 study and a subset of the Pediatric Research in Office Settings (PROS) study were the only U.S. studies to use this technique.18,19 Comparing data from the US National Health and Nutrition Examination Surveys (NHANES), NHANES III20 and NHANES 1999-2000, Jolliffe21 found an increased prevalence and extent of overweight US children aged 2 to 19 years between 1971 and 2000. With an increased percentage of overweight children, the use of palpation is particularly useful in identifying B2. Visual inspection remains the only method to stage breast development progression (Tanner stages 3-5).15

PH1 to PH5 in boys and girls have been evaluated by visual inspection by a clinician or self-assessment. Male pubic hair development occurs in response to dihy drotestosterone, produced by the testes, binding to androgen receptors and activating androgen-dependent gene transcription in genital skin regions. Female pubic hair development is thought to occur in response to androgens produced from the activation of the hypothalamic-pituitary-adrenal (HPA) axis. Racial differences in patterns and progressive stages of pubic hair development increase the variability in pubic hair staging measures that use a scale (Tanner staging) developed for white individuals.

Age at menarche can be determined by asking a girl or her parent her “current status” (also called “status quo”), whether she has had her first menses by the time of assessment and her birth date, and/or by asking post-menarcheal females (or their mothers) to “recall” their age at first menses. Recall may be less valid and decreases in accuracy with greater time elapsed between menarche and asking for the date.22,23 Height and weight were measured in a number of studies, some longitudinal. In girls, the onset of the pubertal growth spurt usually precedes or is concomitant with the onset of breast development, at least in white girls.10

In boys, the first external signs of puberty are an increase in testicular volume above 3 mL24,25 and Tanner's genital growth stage 2 (G2).11 The increase in testicular volume is attributed primarily to the growth of testicular seminiferous tubules stimulated by follicle-stimulating hormone. A volume of >3 mL is considered the most reliable and valid male pubertal onset marker developed to date.26-28 Testicular volume can be measured by an examiner, using an orchidometer (first developed by Prader29) or a ruler or by ultrasonography. The most common method is using an orchidometer, either by comparing the testis size by palpation to the orchidometer beads or by using a punched-out (ie, card with holes) orchidometer. Measurements of testicular volume using either orchidometer method are well correlated to those using ultrasound.30,31 Although testicular volume is often measured in the clinical setting, it has not been used in human studies of puberty timing in the US population. Instead, most studies used visual assessment of genital development using Tanner staging. Stage 2 corresponds to the first visible signs of testicular and scrotal growth and changes in the scrotal skin.9,11 Additional male pubertal markers include axillary hair development, age at first ejaculation, acne, and voice break, all relatively late male pubertal events. The timing of these markers is subject to higher interindividual variability, and the relationship to the timing of hypothalamic-pituitary-gonadal (HPG) axis and HPA axis activation is not as well understood. Age at spermarche can be determined by the presence of spermatozoa in the first morning urine void (spermaturia) but is not considered a reliable marker because of a high number of false negative results.32,33 In boys, the onset of the pubertal growth spurt is a relatively late pubertal event, usually corresponding to the time of late G3 or early G4.11

Hormone and receptor measurements would be very useful for determining the timing of the activation of the HPG and HPA axes underlying the physical manifestations of puberty. Endogenous hormone levels and other biochemical assays to evaluate the physiologic pubertal status are costly for puberty studies with a large participant number, and, although the technology is improving, they have validity issues such as detection limits and interindividual hormone level variability34 (reviewed by Rockett et al35). This discussion focuses on the physically manifested puberty markers because they were assessed in the studies from 1940 to 1994.

Issues in Secular Trend Analysis for Pubertal Timing Studies

A secular trend is a change in the distribution of an outcome in a population during a specified time frame, usually “a long period of time, generally years or decades.”36 Secular trend analysis of health measures is frequently used to track public health concerns or initiatives. If an adverse trend is detected, then studies are often performed to understand better the possible causes or determinants, and interventions may result. Secular trend analysis is, by its nature, historic, and projections to the future are inherently uncertain. Confidence in predicting future trends is greater when the past trend is stable or represents a steady change, but even smooth trends can change. Understanding the secular trend and the trend of its determinants should improve prediction of future trends.

The ability to compare adequately 2 studies, performed at different points in time, is affected by the similarities in the study designs. A number of study design choices, including general study design (eg, longitudinal), population characteristics (eg, race), age(s) of measurement, puberty measures assessed, and method(s) of puberty measurement affect the quality of the data comparisons. Study differences among selected studies for the age of menarche are illustrated in Table 1. Some reports are based on the same study (eg, NHANES III, Bogalusa Heart Study) but with a different subset of participants, age definition, or analytic approach. Although no 2 studies performed at different times can replicate definitions and analyses precisely, some studies are judged to be similar enough to allow for comparison. Comparability is somewhat limited by the methodologic choices of the historic studies. For example, newer studies may have used more reliable methods (eg, palpation for breast development onset) that are less similar to the methods used in an older study (eg, photographs for breast development onset).

Examples of Study Design Choices That Can Affect Data Comparability Among Selected US Studies of Menarche

Most of the published studies on puberty timing were either cross-sectional or longitudinal. In a cross-sectional study, all measurements are made at a single point in time; pubertal development is assessed only once per individual. For Tanner stage (breast, pubic hair, and genital development), current developmental stage and current age are recorded. For menarche, both current status (menstruating: yes or no) and age are obtained, and recalled age at menarche can be asked of those who report menstruating. The age at menarche may have greater measurement error than for Tanner staging for reports that provided menarche in whole year of age. Conversely, determining menarche by the current status method is more reliable compared with Tanner staging measurements, which are somewhat subjective. In a longitudinal study, each boy or girl is examined at several points in time, with age and status recorded at each examination. Both cross-sectional and longitudinal approaches give information about the distribution of age at onset. In a cross-sectional study, onset for any individual is known to have occurred either before or after the current age at the time of assessment. In a longitudinal study, onset is known to have occurred before the first age of assessment, between the ages at 2 different assessments, or after the age at the last assessment.10 From cross-sectional study data, mean age of onset of successive stages can be estimated; however, this approach, based on single observations of an individual, does not provide the highest quality information on individual variability in the progression through the stages. Longitudinal studies can observe progression through Tanner stages within the same individuals. A longitudinal study provides information on the time each individual spends in intermediate stages, with the quality of the information improving as the number of examinations increases and the time between examinations decreases.

Timing of breast, genital, and pubic hair stages in a population may be described by several characteristics: (1) the average age of onset or attainment of the milestone; (2) the proportion of children who have attained the milestone by a given age; and (3) the average age of all children in a specific stage. Estimates of the first 2 characteristics may be obtained using censored data (from “current status”) techniques. The common approach in puberty studies is to smooth the distribution by assuming that it is either normal or logistic (using probit or logistic regression, respectively). Both distributions are determined by 2 parameters, a mean and a variance; estimates that are based on the 2 assumptions will be similar, because the shapes of the normal and logistic distributions differ little.37 For longitudinal studies, techniques similar to these standard methods can be used to study progression, producing estimates of the distribution of duration of time within a stage.

The age range of the children studied is another important factor. To get the best estimates for age at attainment, the ages at which children are examined should span the range during which most children attain the milestone. If few of the oldest children included have attained the milestone at the end of the study (or, alternatively, most of the youngest included have attained it), then estimates of mean age of attainment will be less precise. A related problem is the possibility of biased estimates of mean age in a stage in a study with a restricted age range. Children should not be included or excluded in analyses on the basis of attainment of the milestone. For example, in a longitudinal study, elimination of children who have not attained the milestone will bias the mean age at attainment downward. Because children change rapidly in the pubertal period, information improves with more precise age measurements. Creating broad groupings by current age (eg, quarter year, whole years) is less desirable than use of exact age. In a longitudinal study, reporting the age of first observation of a stage as the “age of attainment” overestimates the true age of attainment. Finally, rounding of recalled age at attainment, which can occur if age is reported in whole year only or date is reported as a calendar year only, most likely decreases the accuracy of the estimate. It is important to cope with these problems by applying statistical tools that account for the structure of the data (eg, left truncation, right censoring, interval censoring). Such tools are available (eg, life time analysis or analysis of time to event), and models that incorporate continuous traits that vary over time may be applied. Rather than simply estimating possibly biased observed mean ages, such analyses may provide age distributions and allow for estimation of median or quartiles of ages.

Differences in methods to measure puberty markers can affect findings and, thus, comparability of findings between studies. For breast stage, studies may use palpation and/or observation to determine breast stage in girls. In addition, the rater (eg, self, trained professional) and his or her training as well as the setting where the measurement takes place can affect comparability. Intrarater and interrater variability is important information for comparing across studies. Self-assessment is based on showing drawings or photographs to a child and having him or her mark the stage most like himself or herself.38,39 The quality of self-assessment compared with that of a trained examiner has been examined in a number of studies in varying populations, using a variety of instruments in different settings (reviewed by Rockett et al35). Self-assessment in boys and girls for Tanner staging tended to overestimate stage during the early puberty stages and underestimate in the later stages compared with the staging by a physician.39 In a study of visual self-assessment by children aged 6 to 12 years, assessment of breast development by girls and pubic hair development by boys was often different from that of the trained examiner, and obese girls were more likely to overestimate breast stage than nonobese girls.40

Population Variables that Affect Puberty Timing

Studies of higher quality examine a defined group of people with known characteristics (eg, gender, age, socioeconomic status [SES], geographic location) during the time period of interest. Information about genetic and environmental factors that affect puberty timing improves the quality of a study of puberty timing.

Knowledge of genetic factors that could affect puberty timing would be useful. Polymorphisms in genes that regulate puberty timing have been identified, but these genetic markers have not been available until recently.41-44 Racial and ethnic differences reflect a combination of genetic, social, and environmental factors that are different. Race/ethnicity, defined as racial and/or ethnic subpopulations (eg, Hispanic black, non-Hispanic black), was not often recorded in the historic studies but has been recorded in more recent studies. Analyses of the US PROS and NHANES III data revealed racial differences in the timing of female puberty; black girls were younger than white girls at the same stage of breast development, pubic hair development,5,45 and menarche.5,46 Ages for Mexican American girls were between those for the 2 other groups.45-47 For boys, the onset of genital and pubic hair development occurs earlier in black than white boys.6,7

Information on environmental factors including previous nutrition, body fat or size, and race are important in studies of puberty timing. Early severe protein-energy malnutrition presenting as marasmus (severe wasting and growth retardation as a result of insufficient intake) but not kwashiorkor (edematous protein-deficiency malnutrition) delays timing of puberty, at least in girls,48-50 and chronic malnutrition during childhood through to puberty retards pubertal development in boys and girls.51 Among chronically undernourished children, age at menarche was delayed by ∼2 years among the girls and sexual maturity measures were delayed by ∼3 years in boys.

Higher subcutaneous fat levels and BMI at prepubertal ages (5-9 years) are associated with increased likelihood of relatively early (<11 years) menarche.4 A dose-response association between a higher BMI and increased relative hazards for earlier menarche in girls (8-13 years at baseline) has been observed.52 The relationship of body fat and puberty timing is reviewed by Kaplowitz.53

Some studies found that children who had a low birth weight (defined as <2500 g) or were small for gestational age (typically defined as birth weight <10th percentile for gestational age and gender) are more likely to have advanced timing of puberty (girls and boys54; girls only55); however, other results are mixed. For example, 1 study found an association between small size at birth and early puberty in girls but not boys,56 and another found no differences between pubertal timing in children who were born at very low birth weight (<1251 g) and control subjects of both genders.57 An additional issue is that definitions for small birth size and puberty timing outcomes have not been consistent across studies. One hypothesis is that children who are born small and subsequently go through very rapid catchup growth may start puberty earlier.58-60

Examination of us Puberty-Timing Data (1940-1994) for Secular Trends

Girls' Data

The earliest published US data on female puberty timing that included breast and pubic hair development as well as menarche were collected in the 1930s and 1940s (Tables 2 and 3). These 2 small, longitudinal studies included mainly high-SES, white populations.13,14 Although the 2 studies used different Tanner stage definitions, they reported similar mean ages of breast development onset and menarche, but pubic hair development onset ages were more divergent. Studying a small, largely white population in the early 1970s, Lee17 reported older ages of breast development onset and menarche than were observed in either the 2 earlier studies or the Bogalusa Heart Study, a study of a similar time period.61 Analyzing the Bogalusa Heart Study sequential cohort data, Freedman et al4 found that the mean age of menarche in white girls declined 0.2 years and in black girls declined 0.8 years from 1973-1974 to 1992-1994.

The first pubertal timing data on a representative US sample were from the National Health Examination Survey (NHES) cycles I to III conducted by the Centers for Disease Control (later the Centers for Disease Control and Prevention) from 1963 to 1970 (Tables 2 and 3).62,63 For NHES cycles II and III, the mean age at menarche was calculated.63 Tanner stage data were collected only in NHES III, and Harlan et al62 reported the percentage of children of specific ages who were in each stage. Comparisons with other studies were difficult because mean age at attainment for Tanner stages was not calculated. From 1982 through 1984, pubertal data on menarche, breast development, and pubic hair development were collected in the Hispanic Health and Nutrition Examination Survey (HHANES). Villarreal et al64 analyzed the Mexican American data only from HHANES and reported mean ages of B2 and PH2. Because neither NHES III nor HHANES collected pubic hair and breast development stage data on children who were younger than 12 and 10 years, respectively, the results are less useful for examining the age of onset (Tanner stage 2) than studies that assessed earlier ages (eg, 37% of 10-year-old Mexican American girls in HHANES had already begun breast development64; in NHES III, few participants who were 12-17 years of age were in stages 1 or 2). Advantages of the NHES II and, HHANES, and NHANES III data for assessing secular trends are the similar sampling strategies, which include oversampling of black and other racial/ethnic groups, and statistical weighting designed to provide a representative US sample. A possible disadvantage is the inclusion of multiple participants from the same household (ie, nonindependent participants who may be genetically related and have similar environmental exposures).

The PROS study was a cross-sectional study on the timing of US female pubic hair and breast development and menarche on a large, racially diverse population of 3- to 12-year-old girls conducted from 1992 to 1993 by Herman-Giddens et al.5 It is the largest study (n = 17 077) using Tanner staging and menarche, with a much larger sample size than NHES II and III (combined, n = 3000) or NHANES III data sets (n = 2000-2500; Table 2). The youngest age assessed was 3 years, which is optimal for estimating the onset of breast and pubic hair development; however, the age truncation at 12 years excludes observation of puberty completion (Tanner stage 5 and menarche) for many girls. For example, 64.8% of the white girls and 37.9% of the black girls had not achieved menarche by the end of their 12th year (up to 12.99 years of age). The PROS study group is a convenience sample and not representative of a larger population; however, all of the earlier study populations (1930-1970), including Marshall and Tanner (1969),10 on which US clinical standards for puberty timing were derived, were also convenience samples. The participants included girls who were examined during pediatric visits for either a (1) well-child visit at offices that assessed Tanner staging and menarcheal status routinely in the well-child visit (95.74%) or (2) a problem visit that required a complete examination (4.26%). The study by Herman-Giddens et al5 reported younger median ages than any preceding study (Table 3). For example, the median B2 reported was ∼0.6 to 1.2 years earlier and the median PH2 was ∼0.5 to 1.4 years earlier than the values from Reynolds and Wines,13 Nicolson and Hanley,14 or Lee.17

Several articles presented analyses of puberty-timing data of children in 3 racial/ethnic groups (non-Hispanic white, non-Hispanic black, and Mexican American) from NHANES III, conducted between 1988 and 1994 (Table 3).45-47,65 Comparing ages of attainment of pubertal stages between PROS and NHANES III, conducted during similar time intervals (1980-1990s), the median ages from NHANES III are slightly higher than values from PROS (Table 3); however, the different study designs complicate any comparison (Tables 1 and 2) and may explain the differences in findings. An alternative interpretation is that the NHANES III and PROS findings corroborate one another because both reported an earlier age of onset of breast and pubic hair development compared with the ages from data collected in the 1930s and 1940s. Additional comparisons of the NHANES III with data from the 1960s and 1970s were conducted. Comparing NHANES III (1988-1994) and NHES II/III data,63 Chumlea et al46 reported a decrease of 4 months in the median age of menarche (combining racial/ethnic groups) between 1963-1970 and 1988-1994; however, the authors considered the change too small to be meaningful, but statistical results were not presented. In another analysis that compared the NHANES III with reanalyzed NHES II/III data, Anderson et al3 calculated a decrease of 2.5 months in the median age of menarche that the authors interpreted as a meaningful change; nonoverlapping confidence intervals for the median ages were for whites only, suggesting statistical significance. Furthermore, Herman-Giddens et al18 argued that there has been a statistically significant decline in the median age of menarche from 1963-1970 to 1988-1994 because confidence intervals do not overlap for most calculations from NHANES III and NHES II/III data.

To assess whether Tanner stage age attainment has changed between the late 1960s and early 1990s, Sun et al65 performed a reanalysis of NHES III, HHANES, and NHANES III data (reviewed at the workshop in unpublished form). For making better comparisons among the data sets, the analysis was limited to stages 3 to 5 (because NHES III included children 12 to 17, considered too old to estimate stage 2 reliably) and Mexican American children from HHANES. The authors concluded that there is no evidence of an earlier puberty (as measured by median ages of Tanner stages 3-5) during the time spanning the 3 surveys (1960s through 1990s) for either non-Hispanic black or white girls but “some evidence” for Mexican American girls between 1982 and 1994. An advantage of this study is the consistent reanalysis of data from 3 national surveys. Disadvantages of the analysis are the exclusion of information on onset (Tanner stage 2), which many consider to be the key pubertytiming issue of concern, and decreased study power as a result of limiting the sample sizes for comparisons.

A subset of the female puberty-timing studies could assess whether the duration of the puberty, defined as the interval from onset (as measured by B2 or PH2, whichever occurs first) to completion (as measured by menarche). Because the earlier studies13,14,17 (Table 3) all calculated approximately a 2.1- to 2.2-year interval for largely white populations, the duration for white girls only could be compared. Foster et al61 and Wu et al47 observed an increase of ∼2.3 years in the duration of puberty for white girls; however, Herman-Giddens et al5 observed a longer duration of ∼3.1 year for white girls. One interpretation that explains the reported increased duration is an earlier puberty onset with a relatively unchanged or less-changed age at menarche. When comparing studies that allowed for racial comparisons, black girls were found to have a longer duration than white girls.5,47,61

Boys' Data

Far fewer data are available on the ages and patterns of pubertal development in US boys than girls. This difference may indicate a lower interest in studying boys' puberty that may reflect less cultural awareness of male pubertal development compared with girls (eg, breast development, menstruation). In addition, the greater number of published studies assessing girls' puberty timing may reflect the concern about the relationship between early female puberty and later disease, which has been less investigated for boys66,67 (reviewed by Golub et al68).

All of the US analyses of male puberty timing have assessed Tanner stages for genital and pubic hair development staging (Table 4), albeit with study design differences. Testicular volume was measured in only 1 US study, but Tanner genital development was not also assessed.25 The history of US boys' studies is similar to the girls' because most of the US puberty studies collected data on both boys and girls (eg, Fels, Guidance, NHES III, HHANES, NHANES, Bogalusa). For example, the earliest data on the age of puberty in US boys were from small, mostly high-SES white populations.14,17,69 The first boys' data representative of the US population were collected from 1963 to 1970 in NHES III.70 The earliest ages included in NHES III or HHANES data are not optimal for estimating the mean ages of the onset (eg, 14.5% of 10-year-old Mexican American boys in HHANES already had begun genital development) compared with the earliest ages assessed in the Bogalusa and NHANES III studies.

Comparison of the timing of genital development for white boys between the studies conducted in the 1940s to 1970s14,17,61,69 and the 1980s to 1990s6,7,45 suggests that the mean or median age of G2 declined by 1 to 2 years (Table 5). For the age of G3, the difference is less dramatic but is also earlier in 1988-1994. Comparing NHES III and NHANES III data, Sun et al65 calculated a decrease in the median ages for G3 and G4 for non-Hispanic white boys but not for non-Hispanic black boys during this time span. Comparing HHANES and NHANES III data for Mexican Americans, Sun et al65 reported a decrease in the ages of G3 and G4 as well. For the age of G4' differences are not consistently found when studies are compared (eg, G4 is much later according to Lee17 than to Reynolds and Wines69). The age of G5 varies greatly, even among studies conducted within the same decade, calling rater or marker reliability into question (eg, G5 is earlier or later when comparing the 1940s studies with various 1970–1990 studies).

Comparison of the age of PH2 onset for white boys between the earlier studies17,61,69 and NHANES III (1988-1994)67,45 suggests that the mean or median age for pubic hair development onset has declined by ∼0.2 to 0.5 years; however, Foster et al61 reported a higher PH2 mean age for white boys than previous or subsequent studies (from the 1940s to 1950s and the 1988-1994 NHANES III;Table 5), suggesting that pubic hair staging is not a reliable marker. Sun et al65 reported that PH3 and PH4 were earlier for white boys in NHANES III compared with NHES III. Mean ages for PH4 and PH5 exhibit variability among studies, even among those performed in the same decade. Looking at the percentages of 12-year-old white boys in PH2 or higher between NHES III and NHANES III data, similar percentages (73.5% for NHES III; 71.5% for NHANES III) were reported, suggesting no change in the timing of progression of pubic hair development overall between 1966-1970 and 1988–1994.

For white boys' data, comparisons of the duration of male puberty are limited to the studies of Reynolds and Wines,69 Lee,17 and NHANES III, from which the pubertal interval for white boys can be calculated as ∼4.6 years (1940s), ∼3.4 years (1960s/1970s), and ∼5.5 to 5.6 years (1980s/990s). Excluding the study by Lee, which found a shorter pubertal interval than the earlier study (perhaps reflecting pubertal measure variability issues), the other studies suggest that the pubertal interval has increased between the 1940s and the 1990s. Given the caveats of low reliability of male puberty markers and the small number of studies, the reported increased pubertal duration may reflect an earlier age of onset (stage 2) with no significant change in the age of completion (stage 5), as with the girls' study findings.

Discussion

Panel Consensus: Identified Areas of Agreement and Disagreement

The panel agreed that these data indicate racial/ethnic differences in the timing and order of pubertal events for boys and girls and recommended that studies present results by race/ethnic group. Racial/ethnic distinctions may reflect genetic or environmental factors or a combination of both. For example, an individual identified as “non-Hispanic black” in NHANES III might have more in common, genetically and environmentally, with others identified differently. Even with the problems of defining race/ethnicity, the panel preferred and recommended that studies present results by race/ethnic group. Stratification by body size, particularly body fat, was also considered appropriate because body fat and growth have a profound impact on puberty timing.

Girls' Data

Breast Development

Onset (B2)

The majority of panelists concluded that there are sufficient data to suggest (most) or establish (fewer) a secular trend toward an earlier age at onset of breast development on the basis of B2 as a fairly reliable marker of breast development onset and the opinion that an early onset of breast development, even if isolated, was a public health concern (Table 6). B2 is considered a more reliable external puberty-timing marker than pubic hair development because breast development requires estrogen action from the HPG axis. The basis for this opinion was a weight-of-evidence approach from comparing either the recent PROS or NHANES III data for white girls with the earlier studies that assessed B213,14,17 (Tables 2 and 3). Additional corroboration comes from comparing PROS and Bogalusa Heart Study B2 ages for black and white girls separately.61 Comparisons of B2 ages for Hispanic girls were not possible because earlier studies before HHANES did not report data for Hispanic girls and the comparison by Sun et al65 of HHANES and NHANES III did not report values for B2; however, some panelists thought that the data for Mexican American girls from HHANES64 could be reliably compared with those from NHANES III.

Panel Conclusions About Evidenceof SecularTrends in the Timing of Puberty for 1940-1994 Data

A minority of panelists concluded that there are insufficient data to suggest a secular trend on the bassi of the limited quantity and quality of the data, particularly the lack of studies that assessed girls young enough to determine reliably the timing of puberty onset markers, such as B2. As mentioned, girls 10 or 12 years and older were assessed in studies before the 1970s. There was concern about the lack of palpation to determine B2 as well as the difference in Tanner stage assessment method (photographs versus visual examination) across studies. The only study to use palpation in all breast assessments was the small study by Lee et al17; however, the PROS study performed palpation and visual assessment for 39% of the participants,19 and comparisons of findings for those participants indicated no evidence of biased staging when visual assessment alone was performed (ie, mistaking fat tissue for breast tissue).19

Some studies did not describe race/ethnicity of the participants, include multiple racial/ethnic groups (eg, all white population), or adjust for race/ethnicity, yielding 1 summary result for all race/ethnic groups, which limited comparisons for separate racial/ethnic groups between the studies. For this reason, some panelists considered data only for white girls because the older studies assessed only or mainly white girls. The older studies are mainly a high-SES population, which presents another complication when comparing data with those of the more recent studies.

Some panelists did not consider B2 timing changes in isolation, without subsequent breast development progression, as a marker of puberty. This viewpoint is based on the fact that B2 can result without pubertal progression from exogenous estrogen exposure and, thus, does not necessarily indicate HPG axis activation.71,72 However, if pubertal development continues to progress after onset, then this suggests that the breast development onset was attributable to HPG activation. For this reason, some panelists considered B2 to be a less reliable marker of breast development timing than B3 to B5.

Progression (B3-B5)

The majority of the panelists concluded that the data are insufficient to suggest or establish a secular trend toward an earlier achievement of B3 to B5 stages. The reasons for this opinion include a lack of study design comparability between the historical and more recent studies. Some panelists argued that the NHES II/III, HHANES, and NHANES III studies were the only set that could be reliably compared, and they concluded that the change in ages of B3 to B5 was not significant over this time period (1960s–1990s); however, NHES II/III and NHANES III studies do not span the complete interval of time (1940s–1990s) considered by the panel.

The minority concluded that there are sufficient data to suggest a secular trend toward an earlier B3 to B5. They viewed B3 to B5 as more reliable markers of breast development timing because progression indicates activation of the HPG and they are less prone to inaccuracies in staging than B2. The basis for earlier B3 to B5 came from comparing either the more recent NHANES III or PROS data for white girls with the earlier studies13,14,17 (Table 3).

Pubic Hair Development

Onset (PH2)

The majority concluded that PH2 is highly variable (ie, interindividual and interracial/ethnicity variability of ages for PH2), and, therefore, the data are insufficient to suggest or establish a secular trend in the age of PH2. Some panelists viewed pubic hair development, in general, as less reliable and more variable than breast development. A concern about the appropriateness of Tanner staging of pubic hair development10 across racial/ ethnic groups, because it was developed on the basis of white children, was raised. As with breast development, the lack of studies with similar designs was considered a key issue that hindered conclusions about secular changes.

A minority concluded that the data are sufficient to suggest a secular trend toward an earlier onset of pubic hair development because they considered PH2 a more reliable marker than B2 (in the absence of palpation). The more recent studies suggest that the mean age of PH2 is ∼6 months earlier than several decades ago. This conclusion comes from comparing the recent PROS and NHANES III data for white girls that showed an earlier PH2 to any of the earlier studies13,14,17 (Table 3).

Progression (PH3-PH5)

The panel unanimously concluded that the data are insufficient to suggest or establish a secular trend in the timing of pubic hair stages PH3 through PH5 on the basis of (1) the view that PH3 to PH5 markers are relatively subjective (particularly discerning between 4 and 5) and variable (eg, interracial/ethnicity and interindividual); (2) data comparability issues; and (3) exclusion of reliance on the PROS PH3 to PH5 timing data, because the oldest girls were 12 years old and, thus, many were too young to assess these stages.

Menarche

The majority concluded that the data are sufficient to suggest (most) or establish (fewer) a secular trend toward an earlier age of menarche. Most panelists with this opinion included only the NHES II/III and NHANES III studies in their evaluation because these studies used similar methods and population selection criteria. These panelists were convinced by the finding that menarche has decreased by 2.53 to 446months during ∼25 years. Others considered the full spectrum of studies spanning 1940-1994 on which to base their conclusions.

A minority of the panelists concluded that the crosssectional study data are insufficient to conclude that there has been a biologically meaningful change in the age of menarche. Some panelists agreed with the conclusions of Chumlea et al46 that interpreted the 4-month decrease as not significant. In addition, some panelists excluded the menarche data from the PROS study5 from consideration because the age range (3–12 years) was not optimal for calculating the mean age at menarche. Some panelists were concerned about the impact of the different statistical methods used in the studies that compared the NHES II/m and NHANES III data.

Boys' Data

All agreed that statistical power and data quality issues make robust conclusions about the boys' data difficult. The panel discussed but did not come to conclusions on data regarding the tempo of puberty and the pubertal growth spurt.

Genital Development

The unanimous opinion was that the data for genital development (G2 and G3-G5) are insufficient to suggest or establish a trend toward an earlier puberty in boys, on the basis of the lack of data quantity, data quality, and marker reliability. Many boys who were included in NHES III and NHANES III were too old for adequate G2 or PH2 assessment. In addition, the lack of validity and reliability of genital staging, particularly for G2, was considered problematic. Panelists agreed that testicular volume assessment, not performed in the studies, was a more meaningful measure of puberty onset where a testicular volume of >3 mL is conclusive evidence of HPG activation. Data quality was a concern given the degree of disparity, particularly for genital development, between the NHANES III findings and NHES III findings.6

Pubic Hair Development

An overwhelming majority found the pubic hair development (PH2 and PH3-PH5) timing data to be insufficient to suggest or establish a trend. Similar to the argument for insufficiency of genital development data, this opinion was based on the lack of data quantity, data comparability, and marker reliability.

A very small minority of panelists concluded that there is suggestive evidence of an earlier pubic hair onset (PH2) and progression (PH3-PH5) for Mexican American boys but not for white or black boys during this time frame. This opinion was based on the viewpoint that pubic hair development may be easier to stage than genital development and, thus, more reliable. Data supporting a trend toward an increase in linear height in boys from the 1963-1970 to 1988-19947 and from 1973 to 199273 were also considered corroborative of an earlier male pubertal growth spurt.

Reasons for Differences of Opinion

Differences of opinion among the panelists were based on judgment calls that placed more or less value on key aspects of study comparability. Overall, some panelists placed more weight on nationally representative studies (NHES I-III, HHANES, and NHANES III) because of similar designs yet performed at different times. For example, the clinicians tended to place more value on these data perhaps because they look to nationally represen tative data on which to base normative puberty-timing clinical guidelines. In assessing the girls' data, some of the panelists placed more weight on the study by Herman-Giddens et al5 because of its much larger sample size and, hence, statistical power and inclusion of younger ages for assessing Tanner staging. One important design choice was the Tanner stage progression definition (eg, some studies reported the mean age of being “in a stage,” others reported the mean age of “transition to a stage,” and some reported both). Different statistical methods also affect comparisons among studies; however, few panelists were experts in statistics and none in survey statistics. For this reason, detailed scrutiny of similar and appropriate statistical analyses, within and among the studies, was not a focus of the discussions.

Panel Research Recommendations

New Analyses of Existing Data

If the original data from historical puberty studies are available, then parallel analyses of data from multiple studies could yield more meaningful results. Thus, when possible, reanalysis of some of the existing studies with similar statistical approaches is recommended. For example, multiple examinations of the same participants should not be treated as separate cross-sectional studies or included in a single analysis as though they were from separate children. For the studies of largest study populations, 1 approach to examining secular trends is to compare children at specific ages in studies that were conducted at different times and in which each child is only examined once.

When analyzing a probability-based study, such as 1 that uses stratified multistage sampling, analyses must incorporate sample weights to yield accurate results. The size of the population affects the power and, in turn, the study quality. It is important to use statistical methods that are appropriate for the type of data and question at hand, as well as to have comparable statistical methods.

The data from the existing studies could be examined for changes in temporal sequence and tempo of puberty for different racial/ethnic groups. The relative timing of pubertal events may be a critical factor in separating out different types of puberty-timing phenotypes. For example, discordance between Tanner stages (eg, simultaneous PH4 and B2) suggests a relatively high androgen-to-estrogen concentration.

Priority New Studies

The panel recommended longitudinal studies as the optimal design for examining puberty timing in future studies. Tanner stages are best assessed for an individual over time, because it is a relative measure. Pubertal marker data collection may also be imbedded within larger studies. Examples of longitudinal studies in which pubertal timing are or will be tracked are the National Children's Study74,75 (http://nationalchildrensstudy.gov) of 100 000 children to be followed from in utero to 21 years of age, the Danish National Birth Cohort study76 (www.ssi.dk/sw3820.asp) of 100 000 women and their offspring during the first 6 years of life, and the Norwegian Mother and Child Cohort Study (www.fhi.no/ artikler/?id= 51488). These and other studies will provide baseline information with which to compare past studies and future cohorts of children and test hypotheses regarding specific exposures and puberty timing.

A second useful study design is a short-term longitudinal study with sequential cohorts (eg, the Bogalusa Heart Study), sometimes referred to as a mixed cross sectional/longitudinal study. The sequential cohort design is excellent for looking at trends within 1 study because design choices are maintained and secular trends could be detected sooner than with the longitudinal study design. For either of these study designs, collaboration among disciplines and between countries could enhance and encourage the use of similar study designs, which would allow for geographic and temporal comparisons.

For the purposes of secular trend analysis, any new study using different methods (eg, testicular volume assessment) is still limited by methods used in the earlier studies. Thus, new studies would benefit from including design aspects that are in common with the historical studies. For example, the most recent NHANES surveys (4 cycles spanning 1999-2006) have not collected Tanner stage data for boys and girls, losing comparability with the past NHES (I-III), HHANES, and NHANES III data. Recognizing that there are practical limitations of conducting large studies of puberty timing, the panel recommended a study design “wish list” for longitudinal studies:

individual observations repeated every 6 months;

Tanner staging;

breast palpation and testicular volume measurement for puberty onset;

hormonal measurements;

adequately wide age ranges (eg, 6–18 years of age);

geographic differences considered, including migration and adoption;

SES noted;

race/ethnicity noted and defined and data stratified when appropriate;

body fat and weight covariates (eg, BMI);

inclusion (or targeted studies) of other racial/ethnic groups (eg, Native Americans, Asians) in large enough numbers to analyze; and

hypothesis testing.

Tanner staging method was considered critical to the collection of puberty-timing data, but inclusion of breast tissue palpation and testicular volume assessment was recommended to increase the reliability of estimating the ages of breast and genital development onset. Of note, testicular volume in early puberty is being assessed in a US boys' puberty-timing PROS study.77 Training standards for assessing Tanner stage were considered necessary (for girls15; for boys78), and some suggested that only trained clinicians should be raters; however, the use of palpation by a trained examiner for breast and testicular development has practical (eg, financial, technical training) and ethical (eg, consent for breast palpation or testicular measurement of study participants) constraints.

Underlying the interest in performing a secular trend analysis is the concern about whether environmental factors alter the age at puberty. If so, then unless the environmental factors are distributed uniformly, they should result in geographic variation that is measurable in cross-sectional studies. This reasoning led to well controlled, multicenter studies of semen quality conducted recently in the United States, Europe, and Japan that demonstrated significant geographic variation in semen quality.79-81 These studies identified areas with unusually high and low semen quality and thus suggest etiologic studies to identify causes of these differences.82 Such an approach may be useful in identifying factors that affect age at puberty. One testable hypothesis with new studies is whether the secular trend toward an earlier puberty (at least concluded for girls) may be attributable to subpopulation differences that may in turn reflect differential exposures to factors that affect puberty. If puberty timing varies by geography, then the next question is which factor(s) distinguishes 1 geographic region from another (eg, an exposure, genetics).

In support of the hypothesis of geographic variation in puberty-timing data, studies from several countries have found somewhat different mean ages of menarche (reviewed by Parent et al58). For example, European data collected in the 1990s showed mean ages of menarche that are older than those from the US data (eg, menarche mean ages ranged from 13.0 to 13.4 years in Denmark,83,84 —13.2 years in Norway85 and Sweden,86 and 12.6 -12.9 years for white girls in the United States3,5,46,47). Urban-rural differences in the mean age of menarche have also been observed within a country87; however, it is not possible to conclude definitively that mean age at menarche is different among countries without an analysis of the study design comparability.

Racial patterns of differences in puberty timing may also differ by geographic areas. For example, reported racial differences from South African data on female puberty timing have a different pattern than those reported from US data. The median ages for breast development initiation (B2) and completion (B5) were similar for black and white girls, but black girls had a later PH2 and PH5 than white girls who lived in Johannesburg,88 and the median age at menarche for black girls was 10 months later than for white girls. These differences in puberty timing by geography may be a reflection of SES, race/ethnicity, and/or environmental factors.

Recommended Methods Development

Puberty markers that are easier to measure, less invasive, less expensive, and more sensitive and reliable need to be explored, developed, and validated. New methods to enhance and/or replace Tanner staging are recommended. These fall into 2 categories: physical markers and physiologic markers.

Physical Markers

Tanner staging was recognized as an excellent system for assessing pubertal onset and progression; however, a less invasive examination could decrease stress to the participant. Improvements that may achieve this goal include using a same-gender examiner and developing methods and markers. Methods and markers to be developed and validated include the following:

behavioral markers of male and female puberty onset;

muscle mass increase in boys;

infrared photographs for Tanner stage analysis;

computer imaging for testicular volume measurements;

race/ethnic-specific Tanner staging (eg, pubic hair staging in Peruvian Indians is not accurate for estimating sexual maturation89; it was noted that PH4 for Asians is typically the last, adult-type stage); and

Physiologic Markers

The development of reliable biochemical markers would represent a major step forward for puberty-timing studies in that biochemical changes may identify precursor events and provide validation of later phenotypic puberty changes. Important biochemical markers include measurement of hormone and hormone receptor levels. Hormone level assays are available, but improved assay sensitivity and reliability for urine, blood, and saliva are needed for use in large-scale human studies. Assessment of changing hormone levels during a 24 hour period complicates measurements. For example, in prepubertal boys and girls, gonadotropins in serum exhibit large diurnal variations as a result of their pulsatile secretion; therefore, any single measurement is unreliable.34 The uncertainty can be overcome by measuring urinary secretion of luteinizing hormone, which increases before physical signs of puberty are evident.90 During early puberty in boys, testosterone measures are most reliable when blood sampling is done in the morning, when testosterone levels are expected to be highest.91 There are a few available assays for steroid receptor levels in humans that are limited in their validity and still under additional development.92 Recommended physiologic marker methods for development and validation include the following:

improved sensitivity and reliability and reduced cost of hormone level measurements to define further the associations between hormone levels and pubertal status;

Conclusions

The optimum data for studying puberty timing would be the collection of a suite of puberty-timing measures that include physical as well as physiologic markers. The combination of physical and physiologic measures of human puberty timing will help in the understanding of the progression from onset to completion and of interindividual variability and in the characterization of biochemical precursors. For example, sensitive hormone assay data in conjunction with Tanner staging data could define a marker that precedes overt physical changes yet still allow for comparison with earlier studies of puberty timing.

ACKNOWLEDGMENTS

Support for “The Role of Environmental Factors on the Onset and Progression of Puberty” workshop was provided by US Environmental Protection Agency cooperative agreement 830774, the National Institute of Environmental Health Sciences (this research was supported in part by the Intramural Research Program of the National Institutes of Health and National Institute of Environmental Health Sciences), and Serono Inc.

; Drug and Therapeutics and Executive Committees of the Lawson Wilkins Pediatric Endocrine Society. Reexamination of the age limit for defining when puberty is precocious in girls in the United States: implications for evaluation and treatment. Pediatrics.1999;104(4 pt 1):936–941

. Pubertal course of persistently short children born small for gestational age (SGA) compared with idiopathic short children born appropriate for gestational age (AGE). Eur J Endocrinol.2003;149(5):425–432

. Effects of environmental agents on the attainment of puberty: considerations when assessing exposure to environmental chemicals in the National Children's Study. Environ Health Perspect.2005;113(8):1100–1107

Terms of Use
The American Academy of Pediatrics (AAP) takes the issue of privacy very seriously. See our Privacy Statement for information about how AAP collects, uses, safeguards and discloses the information collected on our Website from visitors and by means of technology.FAQ