Abstract

Voice pitch (the perceptual correlate of fundamental frequency, F0) varies considerably even among individuals of the same sex and age, communicating a host of socially and evolutionarily relevant information. However, due to the almost exclusive utilization of cross-sectional designs in previous studies, it remains unknown whether these individual differences in voice pitch emerge before, during or after sexual maturation, and whether voice pitch remains stable into adulthood. Here, we measured the F0 parameters of men who were recorded once every 7 years from age 7 to 56 as they participated in the British television documentary Up Series. Linear mixed models revealed significant effects of age on all F0 parameters, wherein F0 mean, minimum, maximum and the standard deviation of F0 showed sharp pubertal decreases between age 7 and 21, yet remained remarkably stable after age 28. Critically, men's pre-pubertal F0 at age 7 strongly predicted their F0 at every subsequent adult age, explaining up to 64% of the variance in post-pubertal F0. This finding suggests that between-individual differences in voice pitch that are known to play an important role in men's reproductive success are in fact largely determined by age 7, and may therefore be linked to prenatal and/or pre-pubertal androgen exposure.

1. Introduction

As the human body develops and matures throughout the lifespan, anatomical and physiological modifications are reflected in the voice ([1] for review). Most notably, there is a sizeable shift in mean voice pitch (the perceptual correlate of fundamental frequency, F0) following puberty in males [2–4]. This acute pubertal drop in male F0, perceived as a salient decrease in voice pitch, is caused by a surge in circulating androgen levels that is several times greater in males than females and that disproportionately increases the length of male vocal folds by 60%, dramatically decreasing their rate of vibration [3,5]. Although F0 also decreases during adolescence in females, the drop is proportionate and comparatively small [6], resulting in the highly sexually dimorphic pitch that characterizes adult human voices [7].

A growing body of research suggests that men's F0 has undergone intense sexual selection to advertise masculinity, mate quality and threat potential [7]. Dozens of studies indicate that men with relatively low-pitched voices are judged as more attractive by female listeners ([8] for review) and enjoy higher reproductive success (e.g. report higher numbers of sexual partners [9] and have more children born to them in a natural fertility population [10]). Men with low-pitched voices are also judged as physically larger and stronger, more socially dominant, and more competent than are men with higher pitched voices ([11] for review). Although individual differences in men's F0 are only weakly tied to differences in physical body size [12], men with relatively low F0 have higher circulating levels of testosterone [5,7,13–15] and are indeed more formidable than are men with higher F0 [16].

Research to date is therefore consistent with the hypothesis that men's F0 is an androgen-mediated secondary sexual characteristic. Studies examining the development of analogous, androgen-mediated cues in men's faces suggest that masculinized facial features remain fairly stable throughout the lifetime and emerge before the onset of puberty [17–19]. For instance, masculinized facial features in adult men correlated with their prenatal testosterone levels measured from umbilical cord blood [19]. Men with masculinized faces typically have lower pitched voices, suggesting that similar (e.g. hormonal) mechanisms may operate on the development of both facial masculinity and male F0 [20].

Yet, the extent to which individual differences in men's voice F0 remain stable throughout adulthood, or potentially emerge even before pubertal masculinization of the vocal anatomy, has never been investigated. This is largely due to the inherent difficulty of conducting longitudinal voice research with humans. Hence, researchers examining F0 across the lifespan have almost exclusively used cross-sectional cohort designs, comparing the F0s of different individuals at different ages (e.g. [2,5,6,21,22]). The few studies using a within-individual design focused on a period of only 1–5 years in children or adolescents [4,23–26], with the exception of one study that compared the same 15 adult women recorded once in their 20s and once in their 60s using archival recordings [27]. Within-individual designs provide the unique opportunity to test for longitudinal stability of vocal traits while controlling for confounding cohort effects and other individual differences, thereby offering a comparatively robust test of the effects of age and life stage on the voice.

Cross-sectional studies suggest that, in addition to the salient difference in F0 between adult men and women, there exists considerable variation in F0 within sex-age classes. Individual differences in adult F0 typically range between 80 and 175 Hz among men, and 160 and 270 Hz among women [1]. Similar variation is present among pre-, peri- and post-pubertal adolescents [3–6,28,29], and even the average F0 of three-month-old infants' cries ranges from 350 to 550 Hz between individuals, independently of the babies' sex [30]. These studies thus further hint at the possibility that individual differences in F0 might emerge very early in ontogeny.

Using an innovative and relatively untapped resource in voice research (archival recordings), the goals of the present study were twofold. First, we aimed to confirm pubertal and age-related influences on men's F0 using a within-subject longitudinal design that spanned 50 years. Second, we aimed to test the hypothesis that individual differences in F0, already present before the onset of puberty, predict individual differences in F0 after puberty and throughout adulthood.

2. Material and methods

2.1. Voice sample and Up Series

We analysed the voices of all 10 men who took part in the British television documentary the Up Series [31]. All were born in England in the year 1956 and were audio–video recorded for the series once every 7 years from age 7 to 56. The Up Series was first broadcast in 1964 (Seven Up!), with subsequent documentaries debuting in 1970 (7 Plus Seven), 1977 (21 Up), 1984 (28 Up), 1991 (35 Up), 1998 (42 Up), 2005 (49 Up) and 2012 (56 Up) [31]. In addition to 10 men, the Up Series also followed the lives of four British women, however, due to this small and unrepresentative sample, data from these women were not included in this study.

The original aim of the documentary, in 1964, was to get a glimpse of England at the turn of the millennium, at a time when ‘the shop-stewards and executives of the year 2000 are now seven years old’ [32]. The directors chose children from various socio-economic backgrounds. Each episode of the Up Series features interviews in which the participants of the documentary are asked to discuss their personal and professional lives, their values and aspirations, and to share their opinions on a range of topics. These interviews offer a large and longitudinal repertoire of standardized voice recordings.

2.2. Audio extraction

Short audio clips of speech were extracted from each Up Series DVD and saved as WAV files with Audacity v. 1.2.6. We extracted a total of 300 voice clips with an average duration of 11.3 ± 6.6 s, while documenting the speaker, context and recording conditions (e.g. setting, noise level) of each clip. To standardize these variables and reduce potential contextual or social influences on vocal production, we analysed only voice clips extracted from interviews, as well as those in which background noise was negligible and strong emotional content was absent (257 clips). Hence, clips containing highly emotional (e.g. angry) speech or persistent chaotic vocalizations (e.g. laughing, crying, whispering, shouting) were excluded from acoustic analysis. Multiple voice clips were extracted for each individual at each given age (see table 1 for duration and number of voice clips, and participation record).

Total duration (and number) of voice clips analysed for each individual at each age. Total duration is given in seconds followed by the total number of clips (in brackets). Grand totals are a sum of these values across all ages for each individual (final column) and across all individuals for each age (bottom row). Blank cells indicate instances in which an individual did not take part in the documentary at that particular age.

2.3. Voice analysis

Acoustic editing and analysis were performed in Praat v. 5.2.1 [33]. Following manual removal of fragments of acute noise, multi-voicing and non-verbal vocalizations (e.g. laughter, crying), we used a batch-processing script to measure five parameters of fundamental frequency: mean F0 (F0mean), range (F0min and F0max) and F0 contour including standard deviation (F0s.d.) and the coefficient of variation (F0CV). F0CV is given by F0s.d./F0mean and represents the logarithmic perception of voice pitch (i.e. controlling for the effect of magnitude on variability [34]). All F0 parameters were measured using Praat's autocorrelation algorithm with a search range of 150–400 Hz (age 7) or 75–300 Hz (ages 14–56) and a time step of 0.01 s. Spurious octave jumps were manually corrected [30]. Perceptually, voices with relatively lower F0mean, F0min and F0max will sound lower-pitched or ‘deeper’, whereas voices with relatively lower F0s.d. and F0CV will sound more monotone.

3. Results

Linear mixed models (LMMs) fit by maximum-likelihood estimation were used to examine the effect of age on each F0 parameter separately. Each model included individual identity as a random subject variable, allowing the intercept to vary between subjects, and age as a fixed factor. The results of our LMMs confirmed that all F0 parameters changed significantly with age (table 2).

Linear mixed models examining the effect of age on fundamental frequency parameters.

Planned pairwise comparisons with Bonferroni's correction confirmed that F0mean, F0min, F0max and F0s.d. values dropped significantly and dramatically from age 7 to 14, and again from age 14 to 21 (table 3). These F0 parameters then remained relatively stable following puberty, although we observed acute transient increases in F0 parameters at age 28. These age-related changes in F0 are further illustrated in figure 1.

Pairwise comparisons of within-individual differences in fundamental frequency parameters at every given age. The cells above the grey median indicate the mean difference in the given F0 parameter (in Hertz) measured at two given ages (i.e. subtracting the value at the younger age from the value at the older age), based on estimated marginal means. The corresponding cells below the grey median indicate whether this difference was statistically significant, where *p<0.05 (significant at the 0.05 level) and ***p <0.0009 (significant following Bonferonni's correction for multiple comparisons).

Mean fundamental frequency (F0) parameters for males at each age. Error bars represent the standard error of the mean.

Critically, despite significant pubertal effects on F0, linear regression models revealed that men's F0 at age 7 strongly predicted their F0 at every subsequent post-pubertal age in adulthood (ages 21–56, rs = 0.61–0.8, p < 0.001), however, F0 at age 7 did not predict men's F0 at the peri-pubertal age of 14 (r = 0.02, p > 0.05; figure 2). Men's F0s at age 7 explained 37–64% of the variance in their F0s at ages 21–56 (figure 2).

Within-individual relationships between men's F0 before puberty (age 7), around the time of puberty (age 14) and after puberty (ages 21–56). Each individual is represented by a unique symbol. Solid lines represent the slope of the linear regression and dashed lines represent the sample average F0 measured at each age. *p < 0.05, two-tailed.

4. Discussion

In this longitudinal study, we tracked men's fundamental frequency (F0) over a span of 50 years of their lives. Our results corroborate those of earlier cross-sectional and cohort studies, demonstrating sharp pubertal decreases in male F0 parameters between the ages of 7 and 21 [2,3]. Our study contributes two additional key findings. First, despite these profound pubertal drops in F0, we demonstrate that men's F0 stabilizes after puberty and remains remarkably stable throughout adulthood. Second, our results reveal that individual differences in men's F0 are already established by age 7, long before sexual maturation and pubertal influences on the vocal anatomy. These findings have broad implications for our understanding of the developmental mechanisms, adaptive functions and social perception of human voice pitch.

Voice pitch is arguably the most perceptually salient and evolutionarily relevant non-verbal vocal component in humans. Of particular relevance is its close relationship to male dominance and reproductive success, where, for example, men with lower-pitched voices are attributed with higher masculinity or maleness [13] and have greater access to high-quality mates ([1,7,8] for reviews). Our results show that individual differences in men's F0 are not temporary, but rather stable throughout adulthood, providing a reliably static signal of men's mate quality. In addition, these stable individual differences emerge in childhood as 64% of the variance in adult men's F0 could be explained by their F0 before the onset of puberty.

Although hormone levels were not accessible in this study, our findings, combined with those of previous studies demonstrating the effects of testosterone on laryngeal and vocal fold morphology and on subsequent voice pitch production, suggest that pre-pubertal and potentially even prenatal androgen levels may explain a significant portion of the variance in adult F0. Pubertal androgens may function to activate the expression of secondary sex traits, such as F0, whereas the nature or magnitude of this masculinization may be predetermined very early in ontogeny [35]. Indeed, researchers have found that the second-to-fourth digit ratio (an index of fetal testosterone exposure [36]) and prenatal serum testosterone levels predict the degree of facial masculinization in pre-pubertal boys [17] and adult men [18,19].

We observed an average drop in mean F0 of 105 Hz between ages 7 and 14, and an additional 63 Hz drop between ages 14 and 21, far exceeding the just-noticeable differences in voice pitch perception (approx. 5 Hz [37,38]). Minimum, maximum and the standard deviation of F0 showed similarly salient decreases at these ages (table 3). These drastic drops in voice pitch parameters are likely to reflect the effects of pubertal androgens on vocal fold lengthening and dynamics [3,5]. Puberty typically begins between age 9 and 14 in British boys and spans approximately 2 years [39], wherein ‘voice breaking’ (an acute drop in F0/voice pitch) occurs in the final stages of puberty [4]. While our findings indicate that pre-pubertal F0 predicts post-pubertal F0, there was no relationship between men's F0 before puberty (age 7) and around the time of puberty (age 14). This is probably because, as illustrated in the top-left panel of figure 2, only four individuals had completed pubertal F0 lowering by age 14 (those below the 180 Hz average, wherein individual differences in F0 ranged between 110 and 230 Hz), whereas by age 21, all individuals' F0s had fallen within the characteristic range for adult men (80–160 Hz).

Although our findings demonstrate longitudinal stability in men's average F0, this voice feature is known to vary at the intra-individual level as a function of mood and prosodic expression ([40] for review). Hence, all acoustic analyses were performed on speech from interviews that lacked strong emotional content, and F0 measures were averaged across multiple interviews for each individual at each age. This allowed us to compute representative, average F0 measures for each individual and to largely control for the influence of emotions, such as intense stress or excitement, on voice production.

Intriguingly, we observed a transient but systematic increase in men's F0 parameters at age 28. We predict that this brief rise in F0 may be related to social life events, such as first marriage or childbirth, which often take place around this age among European men. Indeed, cues to mating and parenting effort may manifest in the voice, either through the behavioural mechanism of volitional voice modulation [41] or through physiological changes in the body (e.g. changes in relative hormone levels [7,13–15]) that could in turn have an indirect effect on vocal production. While there is some evidence of lower testosterone levels in married versus unmarried men and among those with children versus those without (see e.g. [42–44]), it remains unclear whether this reflects between-individual differences or within-individual fluctuation in androgen levels. The potential influence of social life events on vocal production warrants investigation using a larger longitudinal dataset.

The Up Series documentary included four women; however, this sample size was too small to be sufficiently representative. Replication studies should, therefore, include women to test whether individual differences in F0 also emerge before puberty among females and should include infants as well as post-menopausal/andropausal adults in order to determine whether individual differences in F0 are already present at birth and remain stable after the age of 60. While clearly challenging, future studies may also examine the degree to which sex hormone levels, which we were not able to access in this study, correlate with voice F0 across the lifespan. In particular, researchers may test whether higher prenatal testosterone levels in males predict lower F0 in adulthood.

Our findings have broad social implications. A large body of research demonstrates that voice pitch affects judgements of attractiveness, masculinity, dominance, competence, likeability and trustworthiness, and even impacts political and economic decision-making [45]. Moreover, listeners attribute such traits not only to adults [1] but also to adolescents [5] and even to babies with high- or low-pitched voices [30]. Given that a child's voice pitch hints at the voice pitch he will have as an adult, this attribution of voice-based social labels may begin very early in life, and remain consistent throughout a person's lifetime.

Ethics

The study was reviewed and approved by the Sciences and Technology Cross-Schools Research Ethics Committee (C-REC) of the University of Sussex (ER-REBY-2). The short audio samples used for acoustic analysis were extracted from commercially available DVDs.

Data accessibility

Authors' contributions

D.R. initiated the study. M.F., K.P. and D.R. contributed to the design of the investigation. M.F. collected the data and performed acoustic analyses. K.P. performed statistical analyses, wrote the manuscript and created the figures. The manuscript was reviewed, edited and approved by all authors, who agree to be accountable for the work.

Competing interests

The authors report no competing interests.

Funding

This work was supported by the European Commission through a Marie Sklodowska-Curie individual fellowship to K.P. (H2020-MSCA-IF-2014-655859).

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.