Vocal learning

Vocal learning is the ability to modify acoustic and syntactic sounds, acquire new sounds via imitation, and produce vocalizations. “Vocalizations” in this case refers only to sounds generated by the vocal organ (mammalian larynx or avian syrinx) as opposed to by the lips, teeth, and tongue, which require substantially less motor control.[1] A rare trait, vocal learning is a critical substrate for spoken language and has only been detected in eight animal groups despite the wide array of vocalizing species; these include humans, bats, cetaceans, pinnipeds (seals and sea lions), elephants, and three distantly related bird groups including songbirds, parrots, and hummingbirds. Vocal learning is distinct from auditory learning, or the ability to form memories of sounds heard, a relatively common trait which is present in all vertebrates tested. For example, dogs can be trained to understand the word "sit" even though the human word is not in its innate auditory repertoire (auditory learning). However, the dog cannot imitate and produce the word "sit" itself as vocal learners can.

Hypothetical distributions of two behavioral phenotypes: vocal learning and sensory (auditory) sequence learning. We hypothesize that the behavioral phenotypes of vocal learning and auditory learning are distributed along several categories.[original research?] (A) Vocal learning complexity phenotype and (B) auditory sequence learning phenotype. The left axis (blue) illustrates the hypothetical distribution of species along the behavioral phenotype dimensions. The right axis (black step functions) illustrates different types of transitions along the hypothesized vocal-learning (A) or auditory-learning (B) complexity dimensions. Whether the actual distributions are continuous functions (blue curves), will need to be tested, in relation to the alternatives that there are several categories with gradual transitions or step functions (black curves). Although auditory learning is a prerequisite for vocal learning and there can be a correlation between the two phenotypes (A–B), the two need not be interdependent. A theoretical Turing machine (Turing, 1968) is illustrated [G∗], which can outperform humans on memory for digitized auditory input but is not a vocal learner. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.

Historically, species have been classified into the binary categories of vocal learner or vocal non-learner based on their ability to produce novel vocalizations or imitate other species, with evidence from social isolation, deafening studies, and cross-fostering experiments.[1] However, vocal learners exhibit a great deal of plasticity or variation between species, resulting in a spectrum of ability. The vocalizations of songbirds and whales have a syntactic-like organization similar to that of humans but are limited to Finite-State Grammars (FSGs), where they can generate strings of sequences with limited structural complexity.[2] Humans, on the other hand, show deeper hierarchical relationships, such as the nesting of phrases within others, and demonstrate compositional syntax, where changes in syntactic organization generate new meanings, both of which are beyond the capabilities of other vocal learning groups[3] Vocal learning phenotype also differ within groups and closely related species will not display the same abilities. Within avian vocal learners, for example, zebra finch songs only contain strictly linear transitions that go through different syllables in a motif from beginning to end, yet mockingbird and nightingale songs show element repetition within a range of legal repetitions, non-adjacent relationships between distant song elements, and forward and backward branching in song element transitions.[4]Parrots are even more complex as they can imitate the speech of heterospecifics like humans and synchronize their movements to a rhythmic beat.[5]

Even further complicating the original binary classification is evidence from recent studies that suggests that there is greater variability in a non-learner's ability to modify vocalizations based on experience than previously thought. Findings in suboscine passerine birds, non-human primates, mice, and goats, has led to the proposal of the vocal learning continuum hypothesis by Erich Jarvis and Gustavo Arriaga. Based on the apparent variations seen in various studies, the continuum hypothesis reclassifies species into non-learner, limited vocal learner, moderate vocal learning, complex vocal learner and high vocal learner categories where higher tiers have fewer species. Under this system, previously identified non-human vocal learners like songbirds are considered complex learners while humans fall under the “high” category; non-human primates, mice, and goats, which are traditionally classified as non-learners, are considered limited vocal learners under this system.[1]

The most extensively studied model organisms of vocal learning are found in birds, namely songbirds, parrots, and hummingbirds. The degree of vocal learning in each specific species varies. While many parrots and certain songbirds like canaries can imitate and spontaneously combine learned sounds during all periods of their life, other songbirds and hummingbirds are limited to a certain songs learned during their critical period.

The first evidence for audio-vocal learning in a non-human mammal was produced by Karl-Heinz Esser in 1994. Hand-reared infant lesser spear-nosed bats (Phyllostomos discolor) were able to adapt their isolation calls to an external reference signal. Isolation calls in a control group that had no reference signal did not show the same adaptation.[6]

Further evidence for vocal learning in bats appeared in 1998 when Janette Wenrick Boughman studied female greater spear-nosed bats (Phyllostomus hastatus). These bats live in unrelated groups and use group contact calls that differ among social groups. Each social group has a single call, which differs in frequency and temporal characteristics. When individual bats were introduced to a new social group, the group call began to morph, taking on new frequency and temporal characteristics, and over time, calls of transfer and resident bats in the same group more closely resembled their new modified call than their old calls.[7]

Male humpback whales (Megaptera novaeangliae) sing as a form of sexual display while migrating to and from their breeding grounds. All males in a population produce the same song which can change over time, indicating vocal learning and cultural transmission, a characteristic shared by some bird populations. Songs become increasingly dissimilar over distance and populations in different oceans have dissimilar songs.

Whale songs recorded along the east coast of Australia in 1996 showed introduction of a novel song by two foreign whales who had migrated from the west Australian coast to the east Australian coast. In just two years, all members of the population had switched songs. This new song was nearly identical to ones sung by migrating humpback whales on the west Australian Coast, and the two new singers who introduced the song are hypothesized to have introduced the new "foreign" song to the population on the east Australian coast.[8]

Vocal learning has also been seen in killer whales (Orcinus orca). Two juvenile killer whales, separated from their natal pods, were seen mimicking cries of California sea lions (Zalophus californianus) that were near the region they lived in. The composition of the calls of these two juveniles were also different from their natal groups, reflecting more of the sea lion calls than that of the whales.[9]

Captive bottlenose dolphins (Tursiops truncatus) can be trained to emit sounds through their blowhole in open air. Through training, these vocal emissions can be altered from natural patterns to resemble sounds like the human voice, measurable through the number of bursts of sound emitted by the dolphin. In 92% of exchanges between humans and dolphins, the number of bursts equaled ±1 of the number of syllables spoken by a human.[10] Another study used an underwater keyboard to demonstrate that dolphins are able to learn various whistles in order to do an activity or obtain an object. Complete mimicry occurred within ten attempts for these trained dolphins.[11] Other studies of dolphins have given even more evidence of spontaneous mimicry of species-specific whistles and other biological and computer-generated signals.[12]

Such vocal learning has also been identified in wild bottlenose dolphins. Bottlenose dolphins develop a distinct signature whistle in the first few months of life, which is used to identify and distinguish itself from other individuals. This individual distinctiveness could have been a driving force for evolution by providing higher species fitness since complex communication is largely correlated with increased intelligence. However, vocal identification is present in vocal non-learners as well. Therefore, it is unlikely that individual identification was a primary driving force for the evolution of vocal learning. Each signature whistle can be learned by other individuals for identification purposes and are used primarily when the dolphin in question is out of sight. Bottlenose dolphins use their learned whistles in matching interactions, which are likely to be used while addressing each other, signalling alliance membership to a third party, or preventing deception by an imitating dolphin.[13] Mate attraction and territory defense have also been seen as possible contributors to vocal learning evolution. Studies on this topic point out that while both vocal learners and non-learners use vocalizations to attract mates or defend territories, there is one key difference: variability. Vocal learners can produce a more varied arrangement of vocalizations and frequencies, which studies show may be more preferred by females. For example,[14] observed that male Atlantic bottlenose dolphins may initiate a challenge by facing another dolphin, opening its mouth, thereby exposing its teeth, or arching its back slightly and holding its head downward. This behavior is more along the lines of visual communication but still may or may not be accompanied by vocalizations such as burst-pulsed sounds. The burst-pulsed sounds, which are more complex and varied than the whistles, are often utilized to convey excitement, dominance or aggression such as when they are competing for the same piece of food.[15] The dolphins also produce these forceful sounds when in the presence of other individuals moving towards the same prey. On the sexual side [16] saw that dolphins may solicit a sexual response from another by swimming in front of it, looking back, and rolling on its side to display the genital region. These observations provide yet another example of visual communication where dolphins exhibit different postures and non-vocal behaviors to communicate with others that also may or may not be accompanied by vocalizations. Sexual selection for greater variability, and thus in turn vocal learning, may then be a major driving force for the evolution of vocal learning.

Captive harbor seals (Phoca vitulina) were recorded mimicking human words such as "hello", "Hoover" (the seal's own name) and producing other speech-like sounds. Most of the vocalizations occurred during the reproductive season.[17]

More evidence of vocal learning in seals occurs in southern elephant seals (Mirounga leonine). Young males imitate the vocal cries of successful older males during their breeding season. Northern and southern elephant seals have a highly polygynous mating system with a vast disparity in mating success. In other words, few males guard huge harems of females, eliciting intense male-male competition. Antagonistic vocal cries play an important role in inter-male competitions and are hypothesized to demonstrate the resource-holding potential of the emitter. In both species, antagonistic vocal cries vary geographically and are structurally complex and individually distinct. Males displays unique calls, which can be identified by the specific arrangement of syllable and syllable parts.

Harem holders frequently vocalize to keep peripheral males away from females, and these vocalizations are the dominant component in a young juvenile's acoustic habitat. Successful vocalizations are heard by juveniles, who then imitate these calls as they get older in an attempt to obtain a harem for themselves. Novel vocal types expressed by dominant males spread quickly through populations of breeding elephant seals and are even imitated by juveniles in the same season.

Genetic analysis indicated that successful vocal patterns were not passed down hereditarily, indicating that this behavior is learned. Progeny of successful harem holders do not display their father's vocal calls and the call that makes one male successful often disappears entirely from the population.[18]

Mlaika, a ten-year-old adolescent female African elephant, has been recorded imitating truck sounds coming from the Nairobi-Mombasa highway three miles away. Analysis of Mlaika's truck-like calls show that they are different from the normal calls of African elephants, and that her calls are a general model of truck sounds, not copies of the sounds of trucks recorded at the same time of the calls. In other words, Mlaika's truck calls are not imitations of the trucks that she hears, but rather, a generalized model she developed over time.

Other evidence of vocal learning in elephants occurred in a cross-fostering situation with a captive African elephant. At the Basel Zoo in Switzerland, Calimero, a male African elephant, was kept with two female Asian elephants. Recordings of his cries shows evidence of chirping noises, typically only produced by Asian elephants. The duration and frequency of these calls differs from recorded instances of chirping calls from other African elephants and more closely resembles the chirping calls of Asian elephants.[19]

The following species are not formally considered vocal learners, but some evidence has suggested they may have limited abilities to modify their vocalizations. Further research is needed in these species to fully understand their learning abilities.

Early research asserted that primate calls are fully formed at an early age in development, yet recently some studies have suggested these calls are modified later in life.[20] In 1989, Masataka and Fujita cross-fostered Japanese and rhesus monkeys in the same room and demonstrated that foraging calls were learned directly from their foster mothers, providing evidence of vocal learning.[21] However, when another independent group was unable to reproduce these results, Masataka and Fujita's findings were questioned.[20] Adding to the evidence against vocal learning in non-human primates is the suggestion that regional differences in calls maybe be attributed to genetic differences between populations and not vocal learning.[22]

Other studies argue that non-human primates do have some limited vocal learning ability, demonstrating that they can modify their vocalizations in a limited fashion through laryngeal control[22] and lip movements.[23][24] For example, chimpanzees in both captivity and in the wild have been recorded producing novel sounds to attract attention. By puckering their lips and making a vibrating sounds, they can make a "raspberry" call, which has been imitated by both naïve captive and wild individuals.[23] There is also evidence of an orangutan learning to whistle by copying a human, an ability previously unseen in the species.[24] A cross-fostering experiment with marmosets and macaques showed convergence in pitch and other acoustic features in their supposedly innate calls,[22] demonstrating the ability, albeit limited, for vocal learning.

Mice produce long sequences of vocalizations or "songs" that are used for both isolation calls in pups when cold or removed from nest and for courtship when males sense a female or detect pheromones in their urine. These ultrasonic vocalizations consist of discrete syllables and patterns, with species-specific differences. Males tend to use particular syllable types that can be used to differentiate individuals.[25]

There has been intense debate on whether these songs are innate or learned. In 2011, Kikusui et al. cross-fostered two strains of mice with distinct song phenotypes and discovered that strain-specific characteristics of each song persisted in the offspring, indicating that these vocalizations are innate.[26] However, a year later work by Arriaga et al. contradicted these results as their study found a motor cortex region active during singing, which projects directly to brainstem motor neurons and is also important for keeping songs stereotyped and on pitch. Vocal control by forebrain motor areas and direct cortical projections to vocal motor neurons are both features of vocal learning. Furthermore, male mice were shown to depend on auditory feedback to maintain some ultrasonic song features, and sub-strains with differences in their songs were able to match each other’s pitch when cross-housed under competitive social conditions.[27][28]

With this conflicting evidence, it remains unclear whether mice are vocal non-learners or limited vocal learners.

When goats are placed in different social groups, they modify their calls to show more similarity to that of the group, which provides evidence they may be limited vocal learners according to Erich Jarvis' continuum hypothesis.[29]

There are several proposed hypotheses that explain the selection for vocal learning based on environment and behavior. These include:[30]

Individual Identification: In most vocal-learning species, individuals have their own songs which serve as a unique signature to differentiate themselves from others in the population, which some suggest has driven selection of vocal learning. However, identification by voice, rather than by song or name, is present in vocal non-learners as well. Among vocal learners, only humans and maybe bottlenose dolphins actually use unique names. Therefore, it is unlikely that individual identification was a primary driving force for the evolution of vocal learning.

Semantic Communication: Semantic vocal communication associates specific vocalizations with animate or inanimate objects to convey a factual message. This hypothesis asserts that vocal learning evolved to facilitate enhanced communication of these specific messages as opposed to affective communication, which conveys emotional content. For example, humans are able to shout "watch out for that car!" when another is in danger while crossing the street instead of just making a noise to indicate urgency, which is less effective at conveying the exact danger at hand. However, many vocal non-learners, including chickens and velvet monkeys, have been shown to use their innate calls to communicate semantic information such as ‘a food source’ or 'predator.' Further discrediting this hypothesis is the fact that vocal learning birds also use innate calls for this purpose and only rarely use their learned vocalizations for semantic communication (for example, the African grey parrot can mimic human speech and the black-capped chickadee uses calls to indicate predator size). As learned vocalizations rarely convey semantic information, this hypothesis also does not fully explain the evolution of vocal learning.

Mate Attraction and Territory Defense: While both vocal learners and non-learners use vocalizations to attract mates or defend territories, there is one key difference: variability. Vocal learners can produce more varied syntax and frequency modulation, which have been shown to be preferred by females in songbirds. For example, canaries use two voices to produce large frequency modulation variations called "sexy syllables" or "sexy songs", which are thought to stimulate estrogen production in females. When vocal non-learner females were presented with artificially increased frequency modulations in their innate vocalizations, more mating was stimulated. Sexual selection for greater variability, and thus in turn vocal learning, may then be a major driving force for the evolution of vocal learning.

Rapid Adaptation to Sound Propagation in Different Environments: Vocal non-learners produce their sounds best in specific habitats, making them more susceptible to changes in the environment. For example, pigeons' low-frequency calls travel best near the ground, and so communication higher in the air is much less effective. In contrast, vocal learners can change voice characteristics to suit their current environment, which presumably allows for better group communication.

With the many possible advantages outlined above, it still remains unclear as to why vocal learning is so rare. One proposed explanation is that predatory pressure applies a strong selective force against vocal learning.[30] If mates prefer more variable vocalizations, predators may also be more strongly attracted to more variable vocalizations. As innate calls are typically constant, predators quickly habituate to these vocalizations and ignore them as background noise. In contrast, the variable vocalizations of vocal learners are less likely to be ignored, possibly increasing the predation rate among vocal learners. In this case, relaxed predation pressure or some mechanism to overcome increased predation must first develop to facilitate the evolution of vocal learning. Supporting this hypothesis is the fact that many mammalian vocal learners including humans, whales, and elephants have very few major predators. Similarly, several avian vocal learners have behaviors that are effective in avoiding predators, from the rapid flight and escape behavior of hummingbirds to predator mobbing in parrots and songbirds.

While little research has been done in this area, some studies have supported the predation hypothesis. One study showed that Bengalese finches bred in captivity for 250 years without predation or human selection for singing behavior show increased variability in syntax than their conspecifics in the wild. A similar experiment with captive zebra finches demonstrated the same result as captive birds had increased song variability, which was then preferred by females.[31] Although these studies are promising, more research is needed in this area to compare predation rates across vocal learners and non-learners.

Modern birds supposedly evolved from a common ancestor around the Cretaceous-Paleogene boundary at the time of the extinction of dinosaurs, about 66 million years ago. Out of the thirty avian orders, only three evolved vocal learning and all have incredibly similar forebrain structures despite the fact that they are distantly related (for example, parrots and songbirds are as distantly related as humans and dolphins). Phylogenetic comparisons have suggested that vocal learning evolved among birds at least two or three independent times, in songbirds, parrots, and hummingbirds. Depending on the interpretation of the trees, there were either three gains in all three lineages or two gains, in hummingbirds and the common ancestor of parrots and songbirds, with a loss in the suboscine songbirds. There are several hypotheses to explain this phenomenon:[1]

Independent Convergent Evolution: All three avian groups evolved vocal learning and similar neural pathways independently (not through a common ancestor). This suggests that there are strong epigenetic constraints imposed by the environment or morphological needs, and so this hypothesis predicts that groups that newly evolve vocal learning will also develop similar neural circuits.

Common Ancestor: This alternative hypothesis suggests that vocal learning birds evolved the trait from a distant common ancestor, which was then lost four independent times in interrelated vocal non-learners. Possible causes include high survival costs of vocal learning (predation) or weak adaptive benefits that did not induce strong selection for the trait for organisms in other environments.

Rudimentary Structures in Non-Learners: This alternative hypothesis states that avian non-learners actually do possess rudimentary or undeveloped brain structures necessary for song learning, which were enlarged in vocal learning species. Significantly, this concept challenges the current assumption that vocal nuclei are unique to vocal learners, suggesting that these structures are universal even in other groups such as mammals.

Motor Theory: This hypothesis suggests that cerebral systems that control vocal learning in distantly related animals evolved as specializations of a pre-existing motor system inherited from a common ancestor. Thus in avian vocal learners, each of the three groups of vocal learning birds evolved cerebral vocal systems independently, but the systems were constrained by a previous genetically determined motor system inherited from the common ancestor that controls learned movement sequencing. Evidence for this hypothesis was provided by Feenders and colleagues in 2008 as they found that EGR1, an immediate early gene associated with increases in neuronal activity, was expressed in forebrain regions surrounding or directly adjacent to song nuclei when vocal learning birds performed non-vocal movement behaviors such as hopping and flying. In non-learners, comparable areas were activated, but without the adjacent presence of song nuclei.[32]EGR1 expression patterns were correlated with the amount of movement, just as its expression typically correlates with the amount of singing performed in vocal birds. These finding suggest that vocal learning brain regions developed from the same cell lineages that gave rise to the motor pathway, which then formed a direct projection onto the brainstem vocal motor neurons to provide greater control.[1]

Currently, it remains unclear as to which of these hypotheses is the most accurate.

Primate phylogenetic tree and complex-vocal learning vs. auditory sequence learning. Shown is a primate phylogenetic tree based on a combination of DNA sequence and fossil age data (Goodman et al., 1998; Page et al., 1999). Humans (Homo) are the only primates classified as “vocal learners.” However, non-human primates might be better at auditory sequence learning than their limited vocal-production learning capabilities would suggest. In blue text and (#) we highlight species for which there is some evidence of Artificial Grammar Learning capabilities for at least adjacent relationships between the elements in a sequence (tamarins: Fitch and Hauser, 2004), (macaques: Wilson et al., 2011). Presuming that the auditory capabilities of guenons and gibbons (or the symbolic learning of signs by apes) would mean that these animals are able to learn at least adjacent relationships in Artificial Grammars we can tentatively mark these species also in blue #. Note however, that for the species labeled in black text, future studies might show them to be capable of some limited-vocal learning or various levels of complexity in learning the structure of auditory sequences. Three not mutually exclusive hypotheses are illustrated for both complex-vocal learning and auditory sequence learning. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.

In primates, only humans are known to be capable of complex vocal learning. Similar to the first hypothesis relating to birds, one explanation is that vocal learning evolved independently in humans. An alternative hypothesis suggests evolution from a primate common ancestor capable of vocal learning, with the trait subsequently being lost at least eight other times. Considering the most parsimonious analysis, it seems unlikely that the number of independent gains (one in humans) would be exceeded so greatly by the number of independent losses (at least eight), which supports the independent evolution hypothesis.[1]

As avian vocal learners are the most amenable to experimental manipulations, the vast majority of work to elucidate the neurobiological mechanisms of vocal learning has been conducted with zebra finches, with a few studies focusing on budgerigars and other species. Despite variation in vocal learning phenotype, the neural circuitry necessary for producing learned song is conserved in songbirds, parrots, and hummingbirds. As opposed to their non-learner avian counterparts such as quail, doves, and pigeons, these avian vocal learners contain seven distinct cerebral song nuclei, or distinct brain areas associated with auditory learning and song production defined by their gene expression patterns. As current evidence suggests independent evolution of these structures, the names of each equivalent vocal nucleus are different per bird group, as shown in the table below.

Parallel Song Nuclei in Avian Vocal Learners

Songbirds

Parrots

Hummingbirds

HVC: a letter based name

NLC: central nucleus of the lateral nidopallium

VLN: vocal nucleus of the lateral nidopallium

RA: robust nucleus of the arcopallium

AAC: central nucleus of the anterior arcopallium

VA: vocal nucleus of the arcopallium

MAN: magnocellular nucleus of anterior nidopallium

NAOc: oval nucleus of the anterior nidopallium complex

Area X: area X of the striatum

MMSt: magnocellular nucleus of the anterior striatum

DLM: medial nucleus of dorsolateral thalamus

DMM: magnocellular nucleus of the dorsomedial thalamus

MO: oval nucleus of the mesopallium

MOc: oval nucleus of the mesopallium complex

Vocal nuclei are found in two separate brain pathways, which will be described in songbirds as most research has been conducted in this group, yet it should be noted that connections are similar in parrots[33] and hummingbirds.[34] Projections of the anterior vocal pathway in the hummingbird remain unclear and so are not listed in the table above.

The posterior vocal pathway (also known as vocal motor pathway), involved in the production of learned vocalizations, begins with projections from a nidopallial nucleus, the HVC in songbirds. The HVC then projects to the robust nucleus of the arcopallium (RA). The RA connects to the midbrain vocal center DM (dorsal medial nucleus of the midbrain) and the brainstem (nXIIts) vocal motor neurons that control the muscles of the syrinx, a direct projection similar to the projection from LMC to the nucleus ambiguus in humans[1][35] The HVC is considered the syntax generator while the RA modulates the acoustic structure of syllables. Vocal non-learners do possess the DM and twelfth motor neurons (nXIIts), but lack the connections to the arcopallium. As a result, they can produce vocalizations, but not learned vocalizations.

The anterior vocal pathway (also known as vocal learning pathway) is associated with learning, syntax, and social contexts, starting with projections from the magnocellular nucleus of the anterior nidopallium (MAN) to the striatal nucleus Area X. Area X then projects to the medial nucleus of dorsolateral thalamus (DLM), which ultimately projects back to MAN in a loop[36] The lateral part of MAN (LMAN) generates variability in song, while Area X is responsible for stereotypy, or the generation of low variability in syllable production and order after song crystallization.[30]

Despite the similarities in vocal learning neural circuits, there are some major connectivity differences between the posterior and anterior pathways among avian vocal learners. In songbirds, the posterior pathway communicates with the anterior pathway via projections from the HVC to Area X; the anterior pathway sends output to the posterior pathway via connections from LMAN to RA and medial MAN (MMAN) to HVC. Parrots, on the other hand, have projections from the ventral part of the AAC (AACv), the parallel of the songbird RA, to the NAOc, parallel of the songbird MAN, and the oval nucleus of the mesopallium (MO). The anterior pathway in parrots connects to the posterior pathway via NAOc projections to the NLC, parallel of the songbirdHVC, and AAC. Thus, parrots do not send projections to the striatal nucleus of the anterior pathway from their posterior pathway as do songbirds. Another crucial difference is the location of the posterior vocal nuclei among species. Posterior nuclei are located in auditory regions for songbirds, laterally adjacent to auditory regions in hummingbirds, and are physically separate from auditory regions in parrots. Axons must therefore take different routes to connect nuclei in different vocal learning species. Exactly how these connectivity differences affect song production and/or vocal learning ability remains unclear.[30][37]

An auditory pathway that is used for auditory learning brings auditory information into the vocal pathway, but the auditory pathway is not unique to vocal learners. Ear hair cells project to cochlear ganglia neurons to auditory pontine nuclei to midbrain and thalamic nuclei and to primary and secondary pallial areas. A descending auditory feedback pathway exists projecting from the dorsal nidopallium to the intermediate arcopallium to shell regions around the thalamic and midbrain auditory nuclei. Remaining unclear is the source of auditory input into the vocal pathways described above. It is hypothesized that songs are processed in these areas in a hierarchical manner, with the primary pallial area responsible for acoustic features (field L2), the secondary pallial area (fields L1 and L3 as well as the caudal medial nidopallium or NCM) determining sequencing and discrimination, and the highest station, the caudal mesopallium (CM), modulating fine discrimination of sounds. Secondary pallial areas including the NCM and CM are also thought to be involved in auditory memory formation of songs used for vocal learning, but more evidence is needed to substantiate this hypothesis.[30]

The development of the sensory modalities necessary for song learning occurs within a “critical period” of development that varies among avian vocal learners. Closed-ended learners such as the zebra finch and aphantochroa hummingbird can only learn during a limited time period and subsequently produce highly stereotyped or non-variable vocalizations consisting of a single, fixed song which they repeat their entire lives. In contrast, open-ended learners, including canaries and various parrot species, display significant plasticity and continue to learn new songs throughout the course of their lives.[38]

In the male zebra finch, vocal learning begins with a period of sensory acquisition or auditory learning where juveniles are exposed to the song of an adult male “tutor” at about posthatch day 30 to 60.[39] During this stage, juveniles listen and memorize the song pattern of their tutor and produce subsong, characterized by the production of highly variable syllables and syllable sequences. Subsong is thought to be analogous to babbling in human infants. Subsequently during the sensorimotor learning phase at posthatch day 35 to 90, juveniles practice the motor commands required for song production and use auditory feedback to alter vocalizations to match the song template. Songs during this period are plastic as specific syllables begin to emerge but are frequently in the wrong sequence, errors that are similar to phonological mistakes made by young children when learning a language. As the bird ages, its song becomes more stereotyped until at posthatch day 120 the song syllables and sequence are crystallized or fixed. At this point, the zebra finch can no longer learn new songs and thus sings this single song for the duration of its life.[40]

Previous research has suggested that the length of the critical period may be linked to differential gene expression within song nuclei, thought to be caused by neurotransmitter binding of receptors during neural activation.[45] One key area is the LMAN song nucleus, part of the specialized cortical-basal-ganglia-thalamo-cortical loop in the anterior forebrain pathway, which is essential for vocal plasticity.[36] While inducing deafness in songbirds usually disrupts the sensory phase of learning and leads to production of highly abnormal song structures, lesioning of LMAN in zebra finches prevents this song deterioration,[46] leading to the earlier development of stable song. One of the neurotransmitter receptors shown to affect LMAN is the N- methyl-D-aspartate glutamate receptor (NMDAR), which is required for learning and activity-dependent gene regulation in the post-synaptic neuron. Infusions of the NMDAR antagonistAPV (R-2-amino-5-phosphonopentanoate) into the LMAN song nucleus disrupts the critical period in the zebra finch.[47]NMDAR density and mRNA levels of the NR1 subunit also decrease in LMAN during early song development.[48] When the song becomes crystallized, expression of the NR2B subunit decreases in LMAN and NMDAR-mediated synaptic currents shorten.[49] It has been hypothesized that LMAN actively maintains RA microcircuitry in a state permissive for song plasticity and in a process of normal development it regulates HVC-RA synapses.

Vocalization subsystems in complex-vocal learners and in limited-vocal learners or vocal non-learners: Direct and indirect pathways. The different subsystems for vocalization and their interconnectivity are illustrated using different colors. (A) Schematic of a songbird brain showing some connectivity of the four major song nuclei (HVC, RA, AreaX, and LMAN). (B) Human brain schematic showing the different proposed vocal subsystems. The learned vocalization subsystem consists of a primary motor cortex pathway (blue arrow) and a cortico-striatal-thalamic loop for learning vocalizations (white). Also shown is the limbic vocal subsystem that is broadly conserved in primates for producing innate vocalizations (black), and the motoneurons that control laryngeal muscles (red). (C) Known connectivity of a brainstem vocal system (not all connections shown) showing absence of forebrain song nuclei in vocal non-learning birds. (D) Known connectivity of limited-vocal learning monkeys (based on data in squirrel monkeys and macaques) showing presence of forebrain regions for innate vocalization (ACC, OFC, and amygdala) and also of a ventral premotor area (Area 6vr) of currently poorly understood function that is indirectly connected to nucleus ambiguous. The LMC in humans is directly connected with motoneurons in the nucleus ambiguus, which orchestrate the production of learned vocalizations. Only the direct pathway through the mammalian basal ganglia (ASt, anterior striatum; GPi, globus palidus, internal) is shown as this is the one most similar to AreaX connectivity in songbirds. Modified figure based on (Jarvis, 2004; Jarvis et al., 2005). Abbreviations: ACC, anterior cingulate cortex; Am, nucleus ambiguus; Amyg, amygdala; AT, anterior thalamus; Av, nucleus avalanche; DLM, dorsolateral nucleus of the medial thalamus; DM, dorsal medial nucleus of the midbrain; HVC, high vocal center; LMAN, lateral magnocellular nucleus of the anterior nidopallium; LMC, laryngeal motor cortex; OFC, orbito-frontal cortex; PAG, periaqueductal gray; RA, robust nucleus of the of arcopallium; RF, reticular formation; vPFC, ventral prefrontal cortex; VLT, ventro-lateral division of thalamus; XIIts, bird twelfth nerve nucleus. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.

Humans seem to have analogous anterior and posterior vocal pathways which are implicated in speech production and learning. Parallel to the avian posterior vocal pathway mentioned above is the motor cortico-brainstem pathway. Within this pathway, the face motor cortex projects to the nucleus ambiguous of the medulla, which then projects to the muscles of the larynx. Humans also have a vocal pathway that is analogous to the avian anterior pathway. This pathway is a cortico-basal ganglia-thalamic-cortico loop which begins at a strip of the premotor cortex, called the cortical strip, which is responsible for speech learning and syntax production. The cortical strip includes spans across five brain regions: the anterior insula, Broca’s area, the anterior dorsal lateral prefrontal cortex, the anterior pre-supplementary motor area, and the anterior cingulate cortex. This cortical strip has projections to the anterior striatum which projects to the globus pallidus to the anterior dorsal thalamus back to the cortical strip. All of these regions are also involved in syntax and speech learning.[50]

In addition to the similarities in the neurobiological circuits necessary for vocalizations between animal vocal learners and humans, there are also a few genetic similarities. The most prominent of these genetic links are the FOXP1 and FOXP2 genes, which code for forkhead box (FOX) proteins P1 and P2, respectively. FOXP1 and FOXP2 are transcription factors which play a role in the development and maturation of the lungs, heart, and brain,[51][52] and are also highly expressed in brain regions of the vocal learning pathway, including the basal ganglia and the frontal cortex. In these regions (i.e. the basal ganglia and frontal cortex), FOXP1 and FOXP2 are thought to be essential for brain maturation and development of speech and language.[52]

These similarities are especially interesting in the context of the aforementioned avian song circuit. FOXP2 is expressed in the avian Area X, and is especially highly expressed in the striatum during the critical period of song plasticity in songbirds. In humans, FOXP2 is highly expressed in the basal ganglia, frontal cortex, and insular cortex, all thought to be important nodes in the human vocal pathway. Thus, mutations in the FOXP2 gene are proposed to have detrimental effects on human speech and language, such as grammar, language processing, and impaired movement of the mouth, lips, and tongue,[53] as well as potential detrimental effects on song learning in songbirds. Indeed, FOXP2 was the first gene to be implicated in the cognition of speech and language in a family of individuals with a severe speech and language disorder.

Additionally, it has been suggested that due to the overlap of FOXP1 and FOXP2 expression in songbirds and humans, mutations in FOXP1 may also result in speech and language abnormalities seen in individuals with mutations in FOXP2.[51]

These genetic links have important implications for studying the origin of language because FOXP2 is so similar among vocal learners and humans, as well as important implications for understanding the etiology of certain speech and language disorders in humans.

Currently, no other genes have been linked as compellingly to vocal learning in animals or humans.

^Heinrich, JE; Nordeen KW; Nordeen EJ (2005). "Dissociation between extension of the sensitive period for avian vocal learning and dendritic spine loss in the song nucleus lMAN". Neurobiology of Learning and Memory83: 143–150. doi:10.1016/j.nlm.2004.11.002.