Abstract

miR-9 is an evolutionarily conserved miRNA that is abundantly expressed in Area X, a basal ganglia nucleus required for vocal learning in songbirds. Here, we report that overexpression of miR-9 in Area X of juvenile zebra finches impairs developmental vocal learning, resulting in a song with syllable omission, reduced similarity to the tutor song, and altered acoustic features. miR-9 overexpression in juveniles also leads to more variable song performance in adulthood, and abolishes social context-dependent modulation of song variability. We further show that these behavioral deficits are accompanied by downregulation of FoxP1 and FoxP2, genes that are known to be associated with language impairments, as well as by disruption of dopamine signaling and widespread changes in the expression of genes that are important in circuit development and functions. These findings demonstrate a vital role for miR-9 in basal ganglia function and vocal communication, suggesting that dysregulation of miR-9 in humans may contribute to language impairments and related neurodevelopmental disorders.

eLife digest

When a cell needs to make a protein, it makes a temporary copy of the corresponding gene so that the genetic code can be carried to its protein-making machinery. When the temporary copy of the code is no longer needed, the cell destroys it. This system is fine-tuned by other small stretches of genetic code called microRNAs, which speed up the destruction and so help to switch genes off faster.

Two genes called FOXP1 and FOXP2 are known to have roles in speech and language development in humans. When these genes do not work properly, people have severe difficulties when speaking and understanding speech. But scientists know little about how the brain controls them. The brains of animals with backbones – like birds and mammals – make a microRNA called miR-9. Scientists thought miR-9 may control how active the FOXP1 and FOXP2 genes are in the brain.

Like humans, zebra finches communicate vocally. Young male birds learn to sing by imitating the song of an adult tutor, usually their father. The process is controlled by a brain region called “Area X”. Now, Shi et al. report on the role of miR-9 in vocal learning and singing in zebra finches.

First, the gene for miR-9 was inserted into a virus-based genetic tool. Shi et al. then injected this virus into Area X of juvenile zebra finches, which delivered the gene to the brain cells and forced them to make excess miR-9. A control group received empty virus with no miR-9 gene for comparison. The juvenile finches then grew up with an adult bird that taught them to sing.

Shi et al. found that the birds that overproduced miR-9 did not learn as well as their normal counterparts. Their songs were shorter, they stuttered, and they missed out syllables, which meant that they simply sounded different to their tutors. These young birds also failed to change their tune in different situations, for example, when they met a female zebra finch. Examination of the birds’ brains four weeks after the viral injection showed that the bird versions of the FOXP1 and FOXP2 genes were less active. There were also changes in other genes involved in brain circuit development.

Humans have a brain area like Area X, called the basal ganglia. The link between miR-9 and vocal learning provides a starting point to understand more about language in general. This could lead to improved understanding of conditions like stuttering, Tourette’s syndrome, dyslexia and autism spectrum disorders.

A lentiviral approach to manipulate miR-9 expression in Area X of the zebra finch brain.

(A) Schematic drawing of the song control circuit in the zebra finch brain. The motor pathway (green), which connects HVC (used as a proper name) to RA (robust nucleus of the arcopallium) and eventually the vocal organ, controls song production. The anterior forebrain pathway (blue), which connects HVC to the basal ganglia nucleus Area X, DLM (medial nucleus of the dorsolateral thalamus), LMAN (lateral magnocellular nucleus), and then back to RA, is required for song learning. Area X also receives dopaminergic inputs from the VTA (ventral tegmental area). (B) The lentiviral vector used in this study expresses an mCherry fluorescent marker and miR-9 driven by the human ubiquitin promoter. (C) (Left) A diagram showing viral injection into Area X. (Middle and right) Sagittal sections of the zebra finch brain showing mCherry fluorescent signal in juvenile Area X four weeks after lentivirus injection. (D) The expression of miR-9 and miR-124 in Area X 4 weeks after injection with the lenti-miR-9 virus. p < 0.0001, t(12) = 11.21 for miR-9; p = 0.2879, t(12) = 1.112 for miR-124, unpaired t-test. n = 7 for Area X; n = 4 for adjacent tissue. Data are presented as mean ± SEM.

MicroRNAs (miRNAs) are small, non-protein-coding RNA molecules that regulate gene expression post-transcriptionally. miR-9 is an evolutionarily conserved miRNA that is highly expressed in vertebrate brains (Landgraf et al., 2007; Luo et al., 2012). In the zebra finch, miR-9 is expressed in Area X. Its expression is regulated during developmental vocal learning and in adult males singing undirected songs (Shi et al., 2013), suggesting that miR-9 plays an active role in these processes. miRNAs regulate gene expression by targeting the 3’ untranslated regions (3’UTRs) of mRNAs, leading to mRNA degradation or suppression of protein synthesis. Many of the genes expressed in the nervous system have long 3’UTRs (Mayr, 2016), highlighting the importance of miRNAs in fine-tuning gene expression in nervous system development and function. FOXP1 and FOXP2 are a pair of paralogous transcription factors that have important roles in nervous system development. Deletions, mutations, and copy number variations of the FOXP1 gene have been implicated in a range of neural developmental disorders, including language delay, intellectual disability, and autism spectrum disorders (ASDs) (Hamdan et al., 2010; Horn et al., 2010; O'Roak et al., 2011). Heterozygous mutations in the FOXP2 gene cause severe speech and language impairments (Lai et al., 2001; Vargha-Khadem et al., 1995), accompanied by structural and functional abnormalities in multiple brain regions. These regions include the basal ganglia (Watkins et al., 2002), which is thought to be a component of the distributed neural circuitry that underlies speech and language (Graham and Fisher, 2013). FOXP1 and FOXP2 regulate the transcription of a large number of downstream genes, many of which are critically involved in neural differentiation, neurite outgrowth, synapse formation, and synaptic transmission (Konopka et al., 2009; Spiteri et al., 2007; Tang et al., 2012; Vernes et al., 2007, 2011). Thus, the functional dosage of FOXP1 and FOXP2 needs to be tightly regulated.

Recent in vitro studies have shown that miR-9 regulates the expression of FOXP2 by targeting specific sequences in its 3’UTR (Fu et al., 2014; Shi et al., 2013). miR-9 also regulates avian FoxP1 in embryonic chicken spinal cord (Otaegi et al., 2011). These findings raise the possibility that miR-9 has a role in language development through regulating FOXP1 and/or FOXP2. Taking advantage of the unique vocal behavior and the underlying neural circuitry of songbirds, we sought to assess the consequences of miR-9 overexpression in Area X of juvenile zebra finches on vocal learning and performance. For these studies, we overexpressed miR-9 using a lentiviral approach. We report here that overexpression of miR-9 in juvenile Area X profoundly impairs basal ganglia-dependent developmental vocal learning in juveniles and impairs song performance in adulthood. We further show that these behavioral deficits are accompanied by downregulation of FoxP1 and FoxP2 expression, disruption of dopamine signaling, and widespread changes in the expression of numerous genes that are important for neural circuit development and function.

Results

A lentiviral approach to manipulate miR-9 expression in Area X

The lentiviral vector that we used carried the mCherry fluorescent protein marker driven by the human ubiquitin promoter (hUBC) (Edbauer et al., 2010). We made a miR-9-expressing virus (lenti-miR-9) by inserting a miR-9 precursor sequence downstream of mCherry (Figure 1B). The control virus (lenti-control) carried mCherry alone. When tested in vitro, these lentiviruses effectively infected 293T cells, and overexpression of miR-9 downregulated the FOXP1 and FOXP2 proteins (Figure 1—figure supplement 1A and B). To test these viruses in vivo, we injected lenti-miR-9 or lenti-control virus into Area X of 25-day-old (25 d) male juvenile zebra finches, and examined miR-9 expression levels four weeks later using quantitative real-time PCR (qRT-PCR). In Area X injected with lenti-miR-9, miR-9 expression was increased more than three-fold compared to expression in Area X injected with lenti-control; the expression of an unrelated miRNA, miR-124, did not change (Figure 1C and D, p < 0.0001 for miR-9 and p = 0.288 for miR-124, n = 7). These results indicate that our lentiviral approach allowed effective overexpression of miR-9 in Area X in an miRNA-specific manner.

miR-9 overexpression in juvenile Area X impairs vocal learning

We examined whether and how overexpression of miR-9 in juvenile Area X impairs vocal learning. In these experiments, 23–25 d male juvenile zebra finches (whose vocal learning was about to begin) were injected bilaterally into Area X with either the control virus (control pupils) or the lenti-miR-9 virus (miR-9 pupils). Each pupil was raised individually with an adult tutor from 30 d to 70 d. Pupils’ songs were recorded at 65 d, 80 d, and 100 d (Figure 2A). A zebra finch song is made up of multiple renditions of a motif. A motif typically consists of 5–7 syllables rendered in a fixed sequence, with each syllable bearing distinct acoustic features (Figure 2B). We first analyzed pupils’ songs recorded at 100 d (when they became adults). On the global motif structure level, control pupils imitated their tutors’ song motif without syllable omission. By contrast, miR-9 pupils omitted some of their tutor's syllables and their song motifs were shorter than those of control pupils (Figure 2B). We quantified this phenomenon by manually counting the number of omitted syllables and the number of syllables per motif. We found that 5 of 6 miR-9 pupils omitted tutor syllables (Figure 2C, p = 0.0075; n = 6), and that the average number of syllables per motif was reduced by 24% in miR-9 pupils compared to control pupils (Figure 2D, p = 0.0325; n = 6).

We next examined how well the miR-9 pupils imitated the spectral structure and acoustic features of their tutors’ songs using the Song Analysis Program software (SAP, [Tchernichovski et al., 2000]). SAP computes a similarity score, which reflects how similar two song motifs are, thus indicating how well a pupil learns its tutor’s song. In quantifying motif similarity, we compared 20 pupil motif renditions to 10 tutor motif renditions, and averaged the 200 pairwise measurements for each animal. We found that miR-9 pupils exhibited a lower motif similarity score than control pupils; whereas the control pupils’ motif similarity score was comparable to that of untreated pupils, indicating that virus injection alone did not affect song learning (Figure 3A, p = 0.002, two-tailed Mann-Whitney U Test; n = 6 for control and miR-9 pupils; n = 4 for untreated control pupils). We next ranked the 200 pairwise measurements, and averaged the 10 highest values to obtain a maximum similarity score for each pupil. The maximum similarity score of miR-9 pupils was significantly lower than that of control pupils (Figure 3—figure supplement 1, p < 0.001; n = 6), suggesting that even at their best performance, miR-9 pupils were not able to produce a good copy of their tutors’ song.

To examine how well miR-9 pupils were able to imitate their tutors at the level of individual syllables, we quantified the syllable accuracy scores of control and miR-9 pupils. We found that miR-9 pupils imitated tutors’ syllables less accurately than did control pupils (Figure 3B and C, p = 0.002, two-tailed Mann-Whitney U Test; n = 6). We also examined how individual acoustic features, including duration, mean frequency, goodness of pitch, frequency modulation, and Wiener entropy, differed between pupils and tutors. We found that the mean frequency and Wiener entropy differed between miR-9 pupils and tutors significantly more than between control pupils and tutors (Figure 3D, p = 0.01 for mean frequency; p = 0.006 for Wiener entropy, two-tailed Mann-Whitney U Test; n = 6). In addition, Wiener entropy differed significantly between miR-9 pupils and their tutors, but did not differ significantly between control pupils and their tutors (p < 0.001, t(24) = 8.245 for miR-9 pupils; p = 0.432, t(41) = 0.794 for control pupils, paired t-test; control pupils: n = 42 syllable; miR-9 pupils: n = 25 syllables; n = 6 animals per group) (Figure 3E).

miR-9 overexpression in juvenile Area X impairs song performance and abolishes social-context-dependent modulation of song variability in adulthood

To assess the effect of miR-9 overexpression in juvenile Area X on song performance in adulthood, we examined syllable sequence order in 100 d pupils’ songs. A careful examination of sonograms showed that the songs of miR-9 pupils often exhibited switching of syllable order, truncation of motifs, and/or syllable stuttering (Figure 4A and B). We calculated syllable transition entropy to score these phenomena, where a higher transition entropy score reflects lower stereotypy of syllable sequences. We found that syllable transition entropy of miR-9 pupils was significantly higher than that of control pupils (Figure 4C, p = 0.002, two-tailed Mann-Whitney U Test; n = 6). We also measured trial-by-trial performance variation in syllable acoustic features across multiple renditions of songs of adult (100 d) miR-9 and control pupils. Among the acoustic features analyzed, duration, goodness of pitch, and Wiener entropy were significantly more variable in adult miR-9 pupils than in adult control pupils (Figure 4D, p = 0.009 for duration; p = 0.015 for goodness of pitch; and p = 0.009 for Wiener entropy, two-tailed Mann-Whitney U test; n = 6). These results indicate that overexpression of miR-9 in juvenile Area X leads to more variable song performance in adulthood.

It is known that the expression of miR-9 in Area X is upregulated by singing an undirected song (UDS) but not by singing a female-directed song (DS); furthermore, the acoustic features of UDS are more variable than those of DS (Jarvis et al., 1998; Kao and Brainard, 2006; Leblois et al., 2010; Murugan et al., 2013; Shi et al., 2013; Teramitsu and White, 2006). These findings prompted us to examine the possibility that miR-9 plays a role in modulating song variability according to social context. We recorded both UDS and DS of the same adult pupils (100 d), and examined the trial-by-trial variability in the constant fundamental frequency (cFF) of the same set of syllables in the two song types using a previously established method (Kao and Brainard, 2006; Leblois et al., 2010; Murugan et al., 2013). In control birds, the variability in syllable cFF was greater in UDS than in DS. In miR-9 pupils, however, the syllable cFF remained variable in DS, abolishing the social context-dependent modulation of syllable variability (Figure 5A and B, p = 0.006 for control pupils; p = 0.510 for miR-9 pupils, paired-t test; control pupils: n = 21 syllables, 6 animals; miR-9 pupils: n = 11 syllables, 6 animals). As juveniles are capable of producing an adult-like DS (Kojima and Doupe, 2011), we extended this analysis to 65 d and 85 d juveniles. Similar to the adults, the 65 d and 80 d control pupils produced a more stereotyped DS with reduced variability in cFF, whereas both 65 d and 80 d miR-9 pupils retained variability in DS (Figure 5C, p < 0.05 for 65 d and p < 0.01 for 80 d and 100 d groups, respectively; two-tailed Mann-Whitney U test; at 65 d, control pupils: n = 8 syllables, 3 animals; miR-9 pupils: n = 6 syllables, 4 animals; at 80 d, control pupils: n = 11 syllables, 5 animals; miR-9 pupils: n = 9 syllables, 5 animals; at 100 d, control pupils: n = 14 syllables, 6 animals; miR-9 pupils: n = 10 syllables, 6 animals). Together, these results suggest that miR-9 plays a role in modulating social-context-dependent song variability.

The developmental process of vocal learning and performance

Vocal learning is a developmental process during which a highly variable juvenile song gradually matures into a stereotyped adult song that resembles the tutor’s song. To gain insight into the role that miR-9 may play in this process, we tracked the developmental trajectory of song learning by analyzing songs of pupils produced at 65 d and 80 d. At both 65 d and 80 d, miR-9 pupils imitated poorly, and their songs were less similar to the tutors’ song than those of control pupils (Figure 6A, p = 0.0022 for 65 d, 80 d and 100 d songs, two-tailed Mann-Whitney U test; n = 6 per group). We wondered whether miR-9 overexpression caused a developmental delay, causing miR-9 pupils to require a longer time to learn their song. To assess this, we extended motif similarity analysis to songs of 150 d pupils. We found that at 150 d, the similarity score of miR-9 pupils was significantly lower than that of control pupils (Figure 6A, p = 0.008, two-tailed Mann-Whitney U Test; n = 6). While the control pupils improved the similarity score of their song as they matured, miR-9 pupils did not improve their score from 65 d to 150 d (Figure 6A, p = 0 for control pupils and p = 0.137 for miR-9 pupils, one-way ANOVA; n = 6). We also measured syllable Wiener entropy changes during development. At 65 d, Wiener entropy of the control pupils and the miR-9 pupils was similar, but at both 80 d and 100 d, the Wiener entropy scores of miR-9 pupils were higher than those of the control pupils (Figure 6B, p = 0.482 for 65 d, p = 0.028 for 80 d, and p = 0.020 for 100 d, Mann-Whitney U test). The Wiener entropy of control pupils’ syllables was reduced as they matured, whereas the Wiener entropy of miR-9 pupils’ syllables was not (Figure 6B, p = 0.002 for control pupils and p = 0.709 for miR-9 pupils, one-way ANOVA; n = 6 per group).

We also examined the trial-by-trial variability of syllable acoustic features including the duration, goodness of pitch, and Wiener entropy of songs produced during development. Syllable acoustic variability was reduced in songs of both control pupils and miR-9 pupils from 65 d to 100 d (Figure 6C–E, p < 0.001 for duration, pitch, and Wiener entropy for both control and miR-9 pupils [except p < 0.01 for duration for miR-9 pupils]; one-way ANOVA; control pupils: n = 42 syllables, 6 animals; miR-9 pupils: n = 25 syllables, 6 animals). However, variability in each of these acoustic features was significantly higher in miR-9 pupils than in control pupils at all ages (Figure 6C–E, p < 0.0001 for duration, pitch, and Wiener entropy for each age, two-tailed Mann-Whitney U test; control pupils: n = 42 syllables, 6 animals; miR-9 pupils: n = 25 syllables, 6 animals). These data indicate that miR-9 overexpression in juvenile Area X leads to higher variability in songs throughout maturation. This high level of trial-by-trial variation in acoustic features of miR-9 pupils may contribute to their inability to imitate the tutor’s song accurately.

We wondered whether miR-9 overexpression affected the amount of song production, which subsequently contributed to impaired song learning. To assess this possibility, we examined the amount of singing by pupils at 65 d and 100 d. We found that at both 65 d and 100 d, miR-9 pupils sang slightly more syllables than control pupils, but the differences were not significant (Figure 6—figure supplement 1, p = 0.419 for 65 d; p = 0.109 for 100 d, two-tailed Mann-Whitney U Test; n = 6). Thus, it is unlikely that the amount of singing contributed to the effect of miR-9 on song learning.

Impairments in vocal learning and performance are accompanied by FoxP1 and FoxP2 downregulation, interrupted dopamine signaling, and widespread changes in the expression of genes important for neural circuit development and function

To understand the molecular substrates underlying the impairments in vocal learning and performance described above, we examined changes in gene expression in Area X upon miR-9 overexpression. We first examined FoxP1 and FoxP2 mRNA and protein expression in Area X of juveniles four weeks after viral injection. We found that the expression levels of both FoxP1 and FoxP2 mRNAs were significantly downregulated in Area X injected with lenti-miR-9 compared to Area X injected with lenti-control (Figure 7A, p = 0.006 for FoxP1 and p < 0.0001 for FoxP2, n = 7), whereas no change in expression of either FoxP1 or FoxP2 mRNA was found in tissue adjacent to Area X. We found that both FoxP1 and FoxP2 protein levels were also downregulated in Area X injected with the lenti-miR-9 virus compared to controls (Figure 7B, p = 0.0007 for FoxP1 and p = 0.0037 for FoxP2, n = 4).

The neurotransmitter dopamine plays an important role in modulating basal ganglia circuit plasticity and song stereotypy (Ding and Perkel, 2002; Leblois et al., 2010; Murugan et al., 2013; Sasaki et al., 2006). Both the dopamine D1 (D1R) and D2 (D2R) receptors are expressed in Area X (Kubikova et al., 2010), and D1R is regulated by FoxP2 (Murugan et al., 2013). Therefore, we examined the expression levels of D1R and D2R in Area X in which miR-9 was overexpressed. We found that four weeks after viral injection, D1R was downregulated in Area X injected with the lenti-miR-9 virus compared to Area X injected with the lenti-control virus, whereas the expression of D2R was unchanged (Figure 7C, p < 0.0001 for D1R and p = 0.3384 for D2R, unpaired t-test; n = 7). DARPP-32, a 32 kDa dopamine- and cAMP-regulated phosphoprotein, is a major signal transduction component acting downstream of dopamine receptors, and is highly expressed in striatal medium spiny neurons (Greengard et al., 1999; Murugan et al., 2013). We found that DARPP-32 protein level was significantly downregulated in Area X injected with the lenti-miR-9 virus compared to Area X injected with the lenti-control virus (Figure 7D, p = 0.0022, unpaired t-test; n = 4).

FOXP1 and FOXP2 regulate a large number of downstream transcriptional target genes that have important roles in neural circuit development and functions (Konopka et al., 2009; Spiteri et al., 2007; Vernes et al., 2011). We curated zebra finch homologs of 37 FOXP1 and/or FOXP2 downstream genes based on published literature (Bowers and Konopka, 2012; Konopka et al., 2009; Spiteri et al., 2007; Tang et al., 2012), and examined whether their expression levels changed upon miR-9 overexpression. We found widespread changes in gene expression: 26 of 37 tested genes changed their expression in juvenile Area X 4 weeks after miR-9 viral injection. Consistent with the fact that FOXP1 and FOXP2 are capable of bi-directional regulation of gene expression, either enhancing or repressing gene expression (Li et al., 2004), we observed bi-directional changes in the expression of these downstream genes: 15 upregulated and 11 downregulated. The magnitudes of the changes, however, were moderate; a majority of these genes increased or decreased their expression by less than 50%. We grouped these 26 genes into functional modules: components of synaptic transmission (ABAT, CADPS2, GRIN2B, GRIK2, GRM8, KCNA4), cell adhesion molecules important for dendrite growth and synapse formation (CNTNAP2, CTNNA3, DISC1, FRMPD4, NRXN3, SRPX2, STX1A), transcriptional regulators (ATRX , MEF2C, MTF1, NeuroD6, SOX5), signaling molecules (BDNF, IGF2, NTRK3, PTPRD, VLVDR), and peptidase and components of the ubiquitin protein degradation pathway (MMP2, UBE2L3, UBE3A) (Figure 8A–E, p < 0.05 for all genes, t-test; n = 7 for all genes except for BDNF, CTNNA3, NTRK3, and STX1A, n = 4). The symbols, names, functions, and possible associations with various neural developmental disorders of these genes are summarized in Supplementary file 1. A fraction (11 of 37) of the FOXP1 and FOXP2 downstream genes that we tested did not, however, change in expression following miR-9 overexpression (Figure 8—figure supplement 1). Among the possible explanations are that the regulatory relationships between FOXP1 or FOXP2 and these genes may be species-specific, or that additional cellular regulatory mechanisms may have contributed to the gene expression levels that we observed.

Discussion

We show here that overexpression of miR-9 in the basal ganglia nucleus Area X of juvenile zebra finches impaired developmental vocal learning and adult vocal performance. On the global motif structural level, the most pronounced impairment was that birds with miR-9 overexpression sang shorter song motifs, omitting some of the tutor syllables. This phenomenon in miR-9 pupils (in which FoxP1 and FoxP2 were downregulated) may mirror the limited vocabulary observed in human individuals carrying deletions in the FOXP1 gene, who typically exhibit a working vocabulary of fewer than 100 words at the age of seven (Horn et al., 2010). Songs of birds with miR-9 overexpression in Area X exhibited higher trial-by-trial variability, which was reflected in a more variable sequence of syllable order, truncated motifs, and syllable stuttering. Syntax change has not been reported in prior studies in which FoxP2 was either knocked down or overexpressed in Area X (Haesler et al., 2007; Heston and White, 2015; Murugan et al., 2013). That miR-9 regulates the expression of both FoxP1 and FoxP2 and potentially of other genes in Area X may explain the robust deficits in vocal learning we observed here. Effects of miR-9 on song performance also occurred in a social context-dependent manner. Similar to birds with reduced FoxP2 expression in adult Area X (Murugan et al., 2013), birds with miR-9 overexpression failed to modulate song variability when singing a directed song. Perhaps not coincidentally, miR-9 expression is higher in Area X when adult males naturally sing an undirected song (Shi et al., 2013). It appears that, whether naturally or artificially induced, higher miR-9 levels in Area X render the circuit permissive to a more variable song or interfere with the production of a more stereotyped directed song.

Vocal learning is a developmental process during which a less-structured and highly variable juvenile song gradually transitions to a stereotyped adult song that matches the tutor’s song (Immelmann and Hinde, 1969; Tchernichovski et al., 2001). The effects of miR-9 overexpression on song imitation were apparent at 65 days after hatching, and these birds failed to improve their imitation thereafter (Figure 6A). Acoustic features of syllables of miR-9 pupils also exhibited higher trial-by-trial variability throughout development (Figure 6C–E). It is not clear whether or how higher song variability affects a pupil’s ability to match its song to a tutor’s song. In normal birds, the developmental vocal learning process is accompanied with a gradual increase in the miR-9 level in Area X, reaching its high point as juveniles become adults, and songs become stabilized; meanwhile, throughout the process, the FoxP2 level gradually decreases (Haesler et al., 2004; Shi et al., 2013; Teramitsu et al., 2004). Our current data indicate that an artificially increased miR-9 level (and reduced FoxP2 level) does not result in premature stabilization of a juvenile’s song nor in an increase in similarity to the tutor’s song. Rather, miR-9 overexpression appears to lock Area X in a plasticity-dominant state, preventing further progression toward song stereotypy. Previous lesion studies have shown that lesions in Area X and lMAN, two nuclei in the anterior forebrain pathway, have different consequences on song development. While lesions in lMAN lead to a prematurely stabilized song, birds with Area X lesions never achieve song stability as control birds do (Scharff and Nottebohm, 1991). Although the experimental approaches used in these studies are unrelated (molecular manipulation vs. electrolytic lesion), our observations are more aligned with the effects of lesions in Area X, suggesting that the effect of miR-9 overexpression (gene manipulation) might be restrained by circuit functions, in this case functions of Area X.

According to the reinforcement learning theory, Area X and the AFP guide song motor learning through processes that evaluate the motor output patterns according to auditory feedback, and reinforce favorable motor actions. These processes involve dopamine signaling via D1R and D2R receptors expressed in Area X (Kubikova et al., 2010) and dopaminergic projections from the VTA to Area X (Ding and Perkel, 2002; Doupe et al., 2005; Doya and Tesauro, 1995; Gadagkar et al., 2016; Lewis et al., 1981). Dopamine levels in Area X are higher when birds sing DS (Sasaki et al., 2006). Blocking D1R pharmacologically or reducing D1R expression by knocking down FoxP2 in Area X results in a more variable DS (Leblois et al., 2010; Murugan et al., 2013). Our findings that miR-9 overexpression in Area X selectively downregulated D1R but not D2R provide further evidence of the importance of dopamine signaling in vocal learning and performance. Studies in mammals suggest that the D1R-expressing direct pathway and the D2R-expressing indirect pathway act in an opposing but coordinated manner in the basal ganglia to finely control the timing and synchronization of motor actions; imbalances between the two pathways may lead to movement and cognitive disorders (Cazorla et al., 2015; Gerfen and Surmeier, 2011). The downregulation of D1R but not D2R in Area X, as a consequence of miR-9 overexpression, may have produced a functional imbalance between the two signaling pathways, which may have contributed to impaired song learning and performance. In normal birds, miR-9 expression in Area X is regulated during vocal development, and is upregulated by adult singing UDS (Shi et al., 2013). Thus, the miR-9–FoxP2–dopamine signaling network provides a mechanism that dynamically adjusts the functional balance between the D1R and D2R signaling pathways, allowing birds to adapt to the changing maturation state of song in juveniles and to modulate song stereotypy according to the social context in which a song is produced.

Many of the FOXP1 and FOXP2 downstream transcriptional target genes were identified by genome-wide chromatin immunoprecipitation assays, which depend on the binding of transcription factors FOXP1 or FOXP2 to promoter sequences (Konopka et al., 2009; Spiteri et al., 2007; Vernes et al., 2011). The changes in expression of these downstream genes in Area X when FoxP1 and FoxP2 were downregulated by miR-9 provides further evidence for a regulatory relationship between these genes and FOXP1 or FOXP2 in the basal ganglia circuit critical for vocal communication. Among the genes we tested, many have direct roles in neural circuit development and functions, and their dysregulation has been implicated in various neural developmental disorders. For example, GRIK2, GRIN2B, and GRM8 encode subunits of glutamate receptors. Their altered expression can affect synaptic transmission of Area X neurons. KCNA4 encodes a voltage-gated potassium channel (Ovsepian et al., 2016), and mutations in KCNA4 have been identified in patients exhibiting linguistic disabilities, attention deficit hyperactivity disroder (ADHD), and cognitive impairments (Kaya et al., 2016). MMP2, a member of the matrix metalloproteinase family, plays critical roles in synaptogenesis, dendrite remodeling, and neurogenesis (Fujioka et al., 2012). In songbird brain, MMP2 has been implicated in angiogenesis and neurogenesis in HVC, another song nucleus essential for song learning and production (Kim et al., 2008), suggesting that MMP2 may have a role in Area X circuit development as well. CNTNAP2 encodes a cell surface protein that belongs to the neurexin family, and is important for synaptic formation and clustering of K+ channels at synaptic terminals (Poliak et al., 1999). FOXP2 is known to regulate CNTNAP2 expression by binding to a regulatory sequence in its first intron, and CNTNAP2 dysfunctions have been implicated in specific language impairments, intellectual disabilities, and ASDs (Rodenas-Cuadrado et al., 2014; Vernes et al., 2008). DISC1 (Disrupted in schizophrenia 1) plays important roles in cell migration and dendrite development, and it has been linked to schizophrenia, bipolar disorder, depression, and ASDs (Millar et al., 2000; Thomson et al., 2013). The widespread changes in gene expression caused by miR-9 overexpression touch upon an array of cellular functions, including synaptic transmission, dendrite growth, synapse formation, gene regulation, neurotrophin (BDNF) signaling, and protein degradation. We only tested expression changes in a small fraction of FOXP1 and FOXP2 downstream genes; however, given the fact that two-thirds (26 of 37) of these genes changed expression, it is likely that many of the genes we did not test also changed their expression. Thus, a large number of these affected genes may have collectively contributed to the deficits in vocal behavior.

We noted that the magnitudes of changes in gene expression were moderate. Intriguingly, a recent study of a large cohort of schizophrenia patients found moderate changes (ranging from 10% to 40%) in the expression of several hundred genes in the prefrontal cortex (Fromer et al., 2016). Our observations support the emerging view that subtle but broad changes in gene expression might be a molecular signature underlying complex neural developmental and/or neural psychiatric disorders (Fromer et al., 2016; Purcell et al., 2014). Evidence implicating particular genes in language impairments and autism is often established through genome-wide analysis such as screening for mutations and/or copy number variations in human subjects (O'Roak et al., 2012; Sanders et al., 2012; Yuen et al., 2015). Our findings showing that these genes are expressed in the basal ganglia, and that alterations in their expression are accompanied by impairments in vocal communication, provide additional evidence that these genes function in language processes and development. Among the genes we tested, CNTNAP2 and DISC1 have each been implicated in multiple disorders, including language impairments, ASDs, ADHD, intellectual disabilities, and schizophrenia (Fromer et al., 2016; Purcell et al., 2014), suggesting that these neural developmental disorders, cognitive impairments, and psychiatric disorders share common molecular substrates. It is possible that distinct but overlapping phenotypes are manifested depending on where and when these genes are expressed and how they are regulated, emphasizing the need to study these genes in the context of specific functional neural circuits.

miR-9 is expressed in the embryonic human brain and has been shown to regulate human FOXP2 gene expression (Fu et al., 2014). It is likely that miR-9 plays a role in human language development by fine-tuning FOXP1 and FOXP2 expression, thereby coordinating the expression of a large number of genes that are active in neural development and function. Both FOXP1 and FOXP2 mRNAs have long 3’UTRs containing numerous miRNA binding sites, suggesting that these genes can be regulated by many miRNAs in addition to miR-9. Dysregulation of these miRNAs or of miRNA–FOXP1/FOXP2 interactions by genetic, environmental, or physiological factors, thus, may contribute to language impairments and related neurodevelopmental disorders.

Materials and methods

Animals

Animal usage was approved by the Louisiana State University Health Sciences Center (LSUHSC) Institutional Animal Care and Use Committee. All experiments were conducted in male zebra finches (Taeniopygia guttata). Animals were housed in a 7 a.m. – 7 p.m. light-dark cycle. Juveniles at specific ages were obtained from our breeding colony at LSU School of Medicine, with each bird given an ID at hatching.

Lentivirus production

The lentiviral vector used (a gift from Dr. M. Sheng) carries an mCherry fluorescent marker driven by the human ubiquitin C promoter (hUBC)(Edbauer et al., 2010). The zebra finch genome contains three genes encoding miR-9: miR-9–1, miR-9–2, and miR-9–3 (Luo et al., 2012). We amplified a 300-nt genomic DNA fragment containing the miR-9–3 precursor sequence from the zebra finch genome and inserted it downstream of mCherry in the lentiviral vector to generate the miR-9-expressing virus. The lenti-control virus is an empty vector. For lentivirus packaging and production, viral vectors and packaging plasmids were transfected into 293LTV cells (Cat. No. LTV-100, Cell Biolabs) using the calcium-phosphate method following the manufacturer’s instructions (Clontech). Cell identity and the absence of mycoplasma contamination were confirmed by the vendor. Viral particles were harvested 48 and 72 hr after transfection. The crude supernatant was filtered through a 0.45 µm filter, spun at 2000 RPM, and collected and spun again by ultracentrifugation (25,000 rpm x 2 hr). The precipitated viral particles were resuspended in 50 µl PBS. Typically, we obtained virus suspensions with titers in the range of 1−5 × 109/ml.

Stereotaxic injection

Juvenile birds were separated from their fathers at day 10 (10 d), and raised by their mother in a sound-proof chamber until 30 d. PCR was performed to determine the sex of juveniles (see Supplementary Table 2 for primer sequences). In assigning animals for viral injection, each animal had an equal probability of being injected with the control or miR-9 virus. Viral injection was performed on males at about 25 d. Stereotaxic injection was performed using a stereotaxic device including a head holder (Myneurolab) and a hydraulic microinjector (Narishige). The glass needles used for injection have an inner tip diameter of 25 µm. The stereotaxic coordinates for injection into juvenile Area X were: anterior/posterior 2.8 and 3.2 mm, dorsal ventral 4.2 and 4.4 mm, and medial/lateral 1.3 and 1.5 mm using the bregma point as a reference. Animals were anesthetized with ketamine and xylazine. Each Area X received a viral injection at six or eight sites, 120–150 nl viral suspension per site. To facilitate diffusion of viral particles, the injection needle was allowed to remain at the site for 3–5 min before removal. For behavioral experiments, virus was injected bilaterally. For gene expression experiments, the lenti-miR-9 virus and the lenti-control virus were injected into Area X of opposite hemispheres.

To ensure that the impairments in vocal learning and performance that we observed were due to virally transduced miR-9 expression in Area X, and not due to Area X tissue damage caused by the injection process, we sacrificed the birds after the last song recording, and examined their Area X. All birds showed bilateral expression of virally transduced mCherry in Area X; the average area exhibiting strong mCherry signal accounted for 15–20% of total Area X volume. We also observed scattered cell bodies or dendrites showing an mCherry signal outside the core infected region but within Area X, suggesting that virally transduced gene expression spread beyond the core infected region. There was no difference in total Area X volume between the non-injected and injected groups, and there was no difference in total Area X volume between the lenti-control- and lenti-miR-9-injected groups (Figure 1—figure supplement 2A). In a separate experiment, we also quantified the number of neurons in juvenile Area X one month after viral injection using immunostaining of the neuronal marker Hu. We found similar numbers of Hu+ neurons in Area X of non-injected animals and in Area X injected with the lenti-control virus or with the lenti-miR-9 virus (Figure 1—figure supplement 2B). These results indicate that physical damage to Area X caused by viral injection, if any, was minimal; thus, the behavioral phenotypes that we observed are likely due to virally transduced miR-9 overexpression.

Song behavior and analysis

By 30 days of age, the mother was removed and an adult male tutor was introduced to one miR-9-virus- or control-virus-injected juvenile (pupil). Both the pupil and the tutor were kept together in a sound-proof recording chamber until after 70 d. Songs of pupils were recorded at 65 d, 80 d, 100 d, and 150 d using a microphone (Technica AT803B), an eight-channel computer interface (M-Audio 2626), and SAP software version 1.02. Undirected songs were recorded automatically over two days for each bird at each age. Directed songs were recorded (from the same groups of birds) manually in the morning. Males were induced to sing directed songs by presenting female birds in a nearby cage; the females were changed every 10 min. Songs were classified as directed songs when the male sang facing the female as observed by an experimenter.

Zebra finches sing in bouts. A song bout typically contains multiple renditions of a motif, and a motif contains 5–8 distinct syllables that are rendered in a fixed sequence. A syllable is defined as a continuous segment of sound separated from another syllable by a silence gap, and each syllable can be quantitatively described by a set of distinct acoustic features using the software package Sound Analysis Pro (SAP). To select songs for analysis (for all song analyses described here unless otherwise stated), we manually sorted all song files recorded in one day from 8 a.m. to 12 p.m. and eliminated files representing cage noise. For each pupil, we typically selected 20 song files, approximately evenly spread across the entire set of song files (e.g., if there were 200 song files, the 1st, 11th, 21st, 31st, etc. were selected).

Syllable omission analysis

We manually counted all syllables and syllable types in 20 song files (50–80 motif renditions) of a pupil and 10 song files (25–50 motif renditions) of its tutor. In cases in which a tutor or a pupil sang multiple versions of motifs, all versions were included in counting. Partial motifs typically appearing at the beginning or the end of a recorded song file were excluded from analysis.

Motif similarity and syllable accuracy analysis

SAP first segments pupil and tutor song motifs into short (9 ms) segments, then performs pairwise comparisons between the short song segments along a motif pair and calculates a similarity score. For each pupil, we performed pairwise comparisons between 20 motif renditions by a pupil and 10 motif renditions by its tutor using the asymmetric mode of SAP (Tchernichovski et al., 2000). The resulting measurements of 200 pairwise comparisons were averaged to generate a motif similarity score. For data presented in Figure 3A, two investigators, one blinded to the treatment groups, performed similar analyses using two different sets of song files and motifs. The inter-observer reliability between the two analyses was 0.89. For maximum motif similarity, we ranked the 200 measurements and averaged the ten highest values (top 5%) to obtain the maximum motif similarity score. For syllable accuracy analysis, we measured the accuracy score for each syllable of a pupil’s song motif in 20 renditions using the SAP. The accuracy scores of all syllables in a pupil’s motif were averaged to generate a syllable accuracy score for that pupil.

Acoustic feature analysis

The acoustic features (duration, mean frequency, goodness of pitch, frequency modulation, and Wiener entropy) of each syllable in a motif were measured using SAP. For pupils, 20 motif renditions were analyzed, and for tutors, 10 motif renditions were analyzed. For each syllable, the percentage difference from the tutor was obtained by subtracting a tutor’s measurement from its pupil’s measurement and then dividing by the tutor measurement. Then the percentage difference values of all syllable types of each bird were averaged. A coefficient of variation was calculated for each acoustic feature for each syllable in 20 renditions, and the coefficients of variation for all syllable types for each bird were averaged.

Syllable transition entropy

We segmented songs recorded in two days from 8 a.m. to 12 p.m. using the auto-segmentation function of SAP. This provided us approximately 10,000–19,000 song syllables for each bird. Next, we used the clustering module of SAP to automatically classify these syllables into types (clusters). We visually confirmed that these syllable types matched with the sonograms. Obvious cases of false classification (e.g., due to segmentation inconsistency) were manually corrected. We next calculated the transition frequency between all pairs of syllable types, resulting in a matrix. Thus, for a song motif containing five syllable types (t = A, B, C, D, and E), we calculated syllable transition frequencies for A to A, A to B, A to C, A to D, A to E, and B to A, B to B, B to C,… and so on. Then, for each syllable type t (each row in the matrix), we calculated the relative transition probability pt (p=transition frequency between a syllable pair divided by the sum of transition frequencies of all syllable pairs in a row). We then computed transition entropy for each syllable type t: Entropyt = sum (pt*log[pt]). Next, we computed a weighted transition entropy for each syllable type (Entropytw = Entropyt x syllable weight), so as to give higher weight to the more frequent syllable types. A syllable weight was defined as the transition frequencies of a given syllable type (sum of a row in the matrix) divided by the sum of transition frequencies of all syllable types (sum of the entire matrix). Finally, the overall transition entropy for a song was given by the average of transition entropies of all its syllable types.

Constant fundamental frequency analysis

Following previously established methods (Kao and Brainard, 2006; Leblois et al., 2010; Murugan et al., 2013), syllables containing a segment with a constant fundamental frequency (harmonic stacks) were included in the analysis. The constant fundamental frequency of the same set of syllables in both the DS and UDS contexts was measured using the SAP. Typically, 20–40 syllable renditions from 20 song files in each context were analyzed. For data presented in Figure 5B, two investigators, one blind to the treatment groups, performed similar analyses. The inter-observer reliability between the two analyses was 0.71.

Amount of singing

To quantify the amount of singing for each bird, all song files recorded between 8 a.m. and 12 p.m. on two days were segmented using the batch mode of SAP. This process generated the total number of syllables a bird sang during the indicated time. The number of syllables produced by each bird during the 8 a.m. to 12 p.m. period in one day were then compared (Figure 6—figure supplement 1).

To evaluate the accuracy of injections into Area X and to measure the size of the virally infected region within Area X, after the last song recording, birds were euthanized and brains were sliced into 80 µm sagittal sections. Bright light and fluorescent images were scanned for each section using an Olympus BX61VS microscope equipped with VS-ASW FL software and a 2X lens. The size of Area X and the virally infected region (mCherry positive) within Area X was measured for all Area-X-containing sections using Image J software. For Hu staining, lenti-control and lenti-miR-9 viruses were injected into Area X of opposite hemispheres at about 30 d, and animals were euthanized one month later. Animals were anesthetized with ketamine and xylazine, followed by perfusion with PBS and fixation with 4% paraformaldehyde in PBS. Fixed brains were sliced into 30 µm sagittal sections. For each hemisphere, 3–4 sections containing mCherry signal within Area X were stained with an antibody against Hu (Cat#21271, Life Technologies; 1:500 dilution), followed by a fluorescein-conjugated goat anti-mouse secondary antibody (Cat#F2761, Invitrogen; 1:500 dilution). Images were taken with a Zeiss Axioplan2 fluorescent microscope (40X lens). For each section, the number of Hu+ neurons (green fluorescence) in one or two microscope fields were counted using Image J. The experimenter was blind to treatment groups.

Gene expression analysis

Tissue collection

Four weeks after viral injection, animals were euthanized for gene expression analysis. All animals were observed in the morning for one hour (8–9 a.m.) before being euthanized, and no animal sang during this time. Animal brains were embedded in OCT medium and quickly frozen in dry ice. To obtain virally injected Area X tissue for gene expression (RNA or protein) analysis, we cryo-sectioned fresh frozen brains into 80 µm sagittal sections. Sections were first examined under a dissection microscope. Area X, a round structure less than 1 mm in diameter, visibly stands out from the surrounding areas. Sections containing Area X were then examined with a fluorescent microscope. Sections having an mCherry signal in Area X (confirmed by overlapping fluorescent and bright field images on a computer screen) were used for dissection. Sections were semi-fixed in 70% ETOH/PBS for 30 s, and a syringe needle (25G) was used to pick Area X tissue under a dissection scope, and to transfer it into a protein or RNA lysis buffer.

Western blot analysis

Tissue was lysed with RIPA buffer containing protease inhibitor cocktail (Thermo, cat # 87786). Protein content was quantified with the BCA protein assay kit. Protein samples (20~25 µg) were separated on a 12% SDS-PAGE gel, and then transferred to PVDF membranes. The membranes were incubated overnight at 4°C with primary antibodies diluted in PBS-T containing 5% non-fat milk, followed by incubation with corresponding secondary antibodies for 2 hr at room temperature. Immunoreactive bands were detected using the ECL Chemiluminescence reagent. The following primary and secondary antibodies were used: anti-FOXP1 antibody (1:1000; HPA003876, Sigma-Aldrich); anti-FOXP2 antibody (1:1000; HPA000382, Sigma-Aldrich); anti-DARPP-32 antibody (1:3000; ab40801, Abcam); β-actin antibody (1:1000; sc-47778, Santa Cruz Biotech); goat anti-mouse IgG-HRP (1:2000; sc-2031, Santa Cruz Biotech); and goat anti-rabbit IgG-HRP (1:2000; sc-2030, Santa Cruz Biotech). Two bands with close molecular weights were detected for FoxP2 and for DARPP-32. As both FoxP2 and DARPP-32 can have posttranslational modifications, these bands likely represent posttranslational modification products of these proteins. We quantified the major bands with higher molecular weights. Quantification was normalized to β-actin. For peptide competition experiments, primary antibody was pre-incubated with respective competing peptide for FoxP2 (Cat. No. APrEST77852, Atlas Antibodies) or competing peptide for DARPP-32 (Cat. No. AB189245, ABCAM) following the manufacturer’s guide before application to the blots. Protein extracted from Area X was used in these experiments. Concentrations of primary and secondary antibodies were the same as those described above.

qRT-PCR

qRT-PCR was performed as described previously (Shi et al., 2013). Briefly, total RNA was isolated using TRIZOL reagent (Invitrogen). RNA was quantified using a Nanodrop spectrophotometer. To quantify miRNA expression levels, reverse transcription and qPCR were performed using the TaqMan microRNA assay kit following the manufacturer’s protocol (Applied Biosystems). Briefly, reverse transcription was performed in a 15 µl reaction mix containing 10 ng total RNA, 3 µl miRNA primer mix, 1 mM dNTP, 50 U reverse transcriptase, and 3.8 U RNAse inhibitor. The PCR reaction was performed using the TaqMan probe mix in 10 µl TaqMan Universal PCR Master Mix. U6 small RNA was used as an internal control following the manufacture’s recommendations. The specificities of the U6 small RNA and miRNA primers have been extensively tested and established by their manufacturer. To measure the expression of FoxP1 and FoxP2 and their downstream genes, reverse transcription was performed using 50 ng total RNA with the iScript Reverse Transcription Supermix kit (Bio-Rad). qPCR was performed using the iQ SYBR Green Supermix (Bio-Rad). GAPDH was used as a reference gene. Relative gene expression levels between experimental groups were determined using the comparative Ct (2-ΔΔCt) method after normalizing to reference genes. For all samples, reverse transcription and qPCR were performed twice, and qPCR was carried out in triplicate. Primers for quantification of miR-9, miR-124, and U6 were obtained from Invitrogen (miR-9 — 4427975 and 000583; miR-124 — 4440886; and U6 — 4427975 and 001973). All other primers were obtained from IDT (Integrated DNA Technology); their sequences are listed in Supplementary file 2.

Statistical analysis

All information related to statistical analysis is documented in the corresponding figure legends. Sample sizes were not statistically determined, but were similar to those generally employed in the field. In all figures (unless otherwise stated), each circle represents data from one animal. For data presented in Figure 2, Figure 3A, and Figure 5B, two investigators, one blind to the experimental groups, analyzed two different sets of song files. Two injected animals (one control pupil and one miR-9 pupil) were excluded from song behavior analysis because injection was outside of Area X. Data were assumed to have normal distributions, but this was not formally tested. Variance was assumed to be similar between groups, but this was not formally tested.

Decision letter

Stephanie A White

Reviewing Editor; University of California, Los Angeles, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "miR-9 regulates basal ganglia-dependent developmental vocal learning and adult vocal performance in songbirds" for consideration by eLife. Your article has been favorably evaluated by a Senior Editor and three reviewers, one of whom, Stephanie A White (Reviewer #1), is a member of our Board of Reviewing Editors.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Shi et al. use the birdsong model to build upon their 2013 discovery that the language-related gene, FoxP2, is regulated by micro-RNAs within vocal control circuitry. The prior study established that miR-9 targets FoxP2. When birds song, miR-9 was shown to be up-regulated within Area X, the vocal dedicated subregion of the songbird basal ganglia, which resulted in FoxP2 down-regulation. In addition to being vocally regulated, this up-regulation depended upon the social context in which birds sang. To extend these observations, in the present study, the authors over-express miR-9 within Area X of young birds during the sensorimotor critical period for song development and examine the consequences on song learning, social-context dependent song variability in adults and on Area X protein and gene expression. A key finding is of broad song learning deficits. A second observation is that miR9 over-expression diminishes social context-dependent changes in song variability in adulthood. Finally, the authors show that these behavioral changes are accompanied by gene expression changes in Area X, including predicted FoxP2 targets and molecules involved in dopaminergic signaling, known to regulate social context variability. In sum, these studies provide a logical follow up to prior work and now provide a novel and compelling link between microRNA and vocal learning, with implications for human language learning.

Essential revisions:

1) Behavioral analyses: The crux of this study lies in altered song behavior. As such, the quality of the behavioral assays and the details of their description are critical.

1.1) Unusually high and consistent scores for learning in the control birds – This is epitomized by the statement: "Each control pupil imitated the song motif of its tutor completely – syllable by syllable in the same sequence" (given that no special tutoring paradigms appear to have been used). Moreover, Figures 2C and D recapitulate this '100%' learning and Table 1 shows 0 motif syntax changes across 6 pupils. In contrast, most studies reveal a range of learning in control animals. For example, the cited Haesler et al. study shows levels of learning in control birds ranging between 85 and 90%. Similarly in Heston et al., (2015) motif similarity ranged from 75-85%. If no variation can be shown in control birds it is a clear sign that the measurements are not sensitive enough to detect a normal range.

Given that the authors have collected all the song files, using an additional method to assess song is both feasible and warranted. That developed by Mandelblat-Cerf and Fee (2014) does not require users to segment motifs, and can thereby supply an alternative metric to substantiate and strengthen those used use here. A brand new option for assessing syllable types was just published in PLoS One. 2017 Jul 28;12(7):e0181992. doi: 10.1371/journal.pone.0181992. eCollection 2017. A fast and accurate zebra finch syllable detector. Pearre B, Perkins LN, Markowitz JE, Gardner TJ. These or another method of analysis should be used to corroborate the reported findings; alternatively, discrepancies should be reported.

1.2) The above issue relates to two methodological concerns: First, it is stated in the second paragraph of the subsection “Song recording and analysis” in the Materials and methods that motifs were 'randomly' selected (and later, birds were 'randomly' assigned to experimental groups). 'Random' has a precise scientific meaning and, unless e.g. some type of random number generator was linked to the selection of motifs, this explanation is insufficient. Second, it is stated that two experimenters evaluated song learning, only one of whom was double-blinded to the group assignment of the animal. Inter-relater reliability was then said to be 'similar'. The authors should provide metrics to substantiate this statement. Moreover, assessment of song variability as a function of social context should be performed blind to that context.

1.3) Experimental design: Were control and experimental pupils given the same tutor in order to control for tutor song complexity? This is important since, e.g. giving a simple tutor song to one group and a complex song to the other would skew the results regardless of molecular interventions. It appears that they were, as illustrated in Figure 2B where each type of pupil's song is compared to the same tutor song. This should be articulated.

1.4) Behavioral regulation of gene expression – A key consideration is the behavioral state of the animal at sacrifice. Prior work from this group, and from those of White and Scharff indicate that miR-9 and FoxP2 are behaviorally regulated, yet no mention is made of what the behavioral state of the animal was at sacrifice, nor of the time of day of sacrifice. As this could strongly influence gene expression data, this is a key issue to be explicit about.

1.5) Blocked learning versus developmental delay – Are the miR-9 pupils just developmentally delayed and could they eventually learn to produce a good copy of the tutor song? Figure 6 shows that many of these features improve with age in both miR-9 pupils and control pupils, so the poor imitation may merely reflect a developmental delay. Alternatively, poor imitation at 100d might reflect less practice (i.e., do the miR-9 pupils sing just as much as control pupils?). The authors could address this issue by quantifying the amount of singing in the two groups at the three different ages (65d, 85d, and 100d). In addition, they could analyze any songs of birds beyond 100d.

In the same vein, are the songs of miR-9 pupils crystallized? By 100d, are these birds singing a stereotyped sequence of song elements or are the songs of these birds highly variable from one song bout to the next? Previous studies have shown that lesions of Area X in juvenile birds prevent song crystallization, resulting in poor imitation (Introduction, second paragraph). In contrast, lesions of the downstream nucleus LMAN in juvenile birds results in premature crystallization and poor imitation (i.e., they only copy a few syllables of the tutor song). In both groups, the lesioned birds failed to produce an accurate copy of the tutor song but in completely different ways.

1.6) Interpretation – miR-9 overexpression from ~50d-100d could disrupt the ability of juvenile birds to 1) generate variable motor sequences; or 2) to select those motor programs that result in a good match to the tutor song model. Additional behavioral analyses would distinguish between these two possibilities:a) Quantify sequence consistency/linearity or transition entropy at different developmental stages. Does the variability in the sequence change over time?b) Instead of measuring the average similarity between the pupil's song and the tutor's song (Figure 2), quantify the maximum similarity score to determine whether miR-9 pupils are capable of producing good copies of the tutor song.c) Instead of measuring the difference in a particular acoustic feature between the pupil and student (Figure 3), quantify the "accuracy" of each syllable. Are miR-9 pupils able to produce accurate copies of the tutor song syllables? Or are the syllables more variable and also less accurate?

1.7) Data presentation – this relates primarily to Figure 3B and 3C. The authors should clarify the y-axis in Figure 3B. Is the absolute value of the difference between pupil and tutor plotted? Are all of the values really positive? Were any of values lower for the pupil's syllable versus the tutor's syllable? Figure 3C plots measures of particular features of a pupil's syllable and the corresponding tutor's syllables, but the number of syllables is not the same across groups (miR-9 v. control pupils) because control birds learned more syllables than miR-9 pupils. However, this makes direct comparison across groups difficult. It would help to indicate the shared syllables across the two groups to allow direct comparison. When comparing Figure 3B and Figure 3C, it is not clear how there are significant differences in the goodness of pitch between control and miR-9 pupils when the values for all the syllables are completely overlapping. Again, here, it would help if one could evaluate the values for just the syllables that are matched across the groups.

1.8) Fundamental frequency analyses – Figure 5A – the fundamental frequency seems to be changing in the segment of the representative syllable (indicated by red and blue lines) that is plotted and analyzed. In this example, the frequency changes from a flat, constant frequency to a downsweep, raising some concern about the analysis. It would be useful to know that the inclusion of the last part of the syllable did not affect the measurement of the variance in the cFF across trials. While the segments used for control versus miR-9 pupils are probably the same across birds, variability in syllable duration was different across the two groups of pupils, and could potentially affect the measurement of cFF. The authors should confirm their measurements of cFF were restricted to constant frequency regions of harmonic stacks.

2) Molecular analyses:

2.1) Developmental versus experimental effects of miR-9 expression – the premise of overexpressing miR-9 makes sense in terms of trying to reduce FoxP2/FoxP1 expression. However, the authors have previously shown that miR-9 expression is developmentally regulated (upregulated). The authors should discuss the discrepancy between their previous finding that miR-9 levels increase from 45d-100d as song matures and becomes more similar to the tutor song (Shi et al., 2013) and the current finding that artificial overexpression of miR-9 in Area X results in greater variability in song (less mature) and lower similarity to the tutor song.

2.2) Cellular phenotypes affected by viral manipulation versus normal phenotypes that express miR-9 – is miR-9 expressed only in medium spiny neurons (MSNs) or is it expressed in other interneurons or the two types of pallidal neurons in Area X? The expression pattern of miR-9 in Area X is significant as it would suggest/constrain mechanisms by which miR-9 alters Area X function (e.g., striatal v. pallidal cell activity), and the authors should indicate which cell types express miR-9 (or cite a previous study).

2.3) Morphological effects – in addition, it would be useful to know whether there were gross changes in the structure of striatal MSNs or other neurons in miR-9 pupils as previous studies have shown that disrupted FOXP2 expression can affect spine density in MSNs in Area X.

2.4) Molecular targets – the focus on FoxP2/FoxP1 makes sense with previous work, but miR-9 has many other targets. How do the authors know that the "downstream" gene expression identified is really due to the change in FoxP2/FoxP1 expression? More detail would be helpful regarding the selected genes for analysis. Of course, it would have been possible to perform RNAseq to identify more genes that were differentially regulated between experimental groups. Moreover, beyond absolute gene expression levels, gene co-expression patterns reveal biologically relevant information. Those approaches may provide greater insight but do not negate the importance of the findings presented here. Rather, more detail regarding putative miR-9 binding sites in the selected genes, or whether the effects are thought to be mediated indirectly via FoxP2 would aid the interpretation of the results.

2.5) Protein analyses – Western blots should not be cropped and MW markers should be included. Given the many isoforms of FoxP1/2, known post-translational modifications, and antibody issues, this is important. Especially since multiple bands are sometimes observed as in Figure 7. Accordingly, 'For Western blots, bands with expected sizes were obtained' is overly vague and does not address the double bands present in Figure 7B and 7D for FoxP2 and Darpp32 respectively. Please clarify.

3) Statistics – subsection “miR-9 overexpression in Area X impairs adult song performance and abolishes social context-dependent modulation of song variability”, last paragraph: Different statistical tests were used for the same type of data; for 100d data, a paired t-test was used, but for 65d and 80d data of similar numbers, the non-parametric Mann-Whitney U test was used. The same tests should be used for the data at different ages or the authors should explain why different statistical tests were used. Given that normal distribution of the data is unlikely to be confirmed with these sample sizes, nonparametric tests and, ideally, bootstrapping or resampling methods are in order. The latter make no assumptions about the distribution of the data.

Author response

Essential revisions:

1) Behavioral analyses: The crux of this study lies in altered song behavior. As such, the quality of the behavioral assays and the details of their description are critical.

1.1) Unusually high and consistent scores for learning in the control birds – This is epitomized by the statement: "Each control pupil imitated the song motif of its tutor completely – syllable by syllable in the same sequence" (given that no special tutoring paradigms appear to have been used). Moreover, Figures 2C and D recapitulate this '100%' learning and Table 1 shows 0 motif syntax changes across 6 pupils. In contrast, most studies reveal a range of learning in control animals. For example, the cited Haesler et al. study shows levels of learning in control birds ranging between 85 and 90%. Similarly in Heston et al., (2015) motif similarity ranged from 75-85%. If no variation can be shown in control birds it is a clear sign that the measurements are not sensitive enough to detect a normal range.

Given that the authors have collected all the song files, using an additional method to assess song is both feasible and warranted. That developed by Mandelblat-Cerf and Fee (2014) does not require users to segment motifs, and can thereby supply an alternative metric to substantiate and strengthen those used use here. A brand new option for assessing syllable types was just published in PLoS One. 2017 Jul 28;12(7):e0181992. doi: 10.1371/journal.pone.0181992. eCollection 2017. A fast and accurate zebra finch syllable detector. Pearre B, Perkins LN, Markowitz JE, Gardner TJ. These or another method of analysis should be used to corroborate the reported findings; alternatively, discrepancies should be reported.

Original Figure 2 may have caused some confusion. The data shown in original Figure 2C/2D did not address motif similarity. Rather, they showed the numbers of copied syllable types and the average number of syllables per motif in control and miR-9 pupils. For example, if a tutor song had 7 syllables ABCDEFG, and the pupil song had the same 7 syllables, then this pupil was scored as coping 100% of the tutor syllable types. If the pupil song had syllables ABDEFG (missing syllable C), this pupil was scored as coping 85% (6/7) of tutor syllables. Indeed, no control birds omitted any tutor syllable types, but miR-9 pupils omitted some of the tutor’s syllables. To avoid this confusion, we have reorganized data presentation in Figure 2. The new Figure 2C shows the number of missing syllable types, and the new Figure 2D shows the average number of syllables per motif compared to the tutor in the control and miR-9 pupils.

Our motif similarity data are shown in Figure 3A. Our control pupils have a motif similarity score of about 85-87%, consistent with the 85-90% and 75-85% ranges reported in the two cited papers (Haesler et al., 2007; Heston et al., 2015). SAP is widely used in quantifying zebra finch song learning, and it was used in several FoxP2 related papers including the two referenced above. Using the same software and similar methods in our analysis facilitates comparisons with those groups. In quantifying motif similarity, we compared 20 pupil motif renditions to 10 tutor motif renditions and averaged the 200 pairwise measurements for each animal. Two lab investigators (one blind to the treatment) analyzed two different sets of songs and obtained similar results (inter-observer reliability was 0.89). These, together with the new maximum similarity data (new Figure 3—figure supplement 1), syllable accuracy data (new Figure 3B), and the similarity data for developmental groups (new Figure 5A), helped substantiate the data presented in Figure 3A. Thus, we did not perform additional similarity analysis using other analysis methods.

We analyzed syllable transition entropy in adult birds and showed the results in New Figure 4C. We found that songs of miR-9 pupils have higher syllable transition entropy than songs of control pupils (also see our reply to point 1.6 below). Old Figure 2E and old Table 1 have been removed.

Regarding the statement “Each control pupil imitated the song motif of its tutor completely …”,

To avoid confusion, we revised the sentence to: “Control pupils imitated their tutors’ song motifs without syllable omission”.

1.2) The above issue relates to two methodological concerns: First, it is stated in the second paragraph of the subsection “Song recording and analysis” in the Materials and methods that motifs were 'randomly' selected (and later, birds were 'randomly' assigned to experimental groups). 'Random' has a precise scientific meaning and, unless e.g. some type of random number generator was linked to the selection of motifs, this explanation is insufficient. Second, it is stated that two experimenters evaluated song learning, only one of whom was double-blinded to the group assignment of the animal. Inter-relater reliability was then said to be 'similar'. The authors should provide metrics to substantiate this statement. Moreover, assessment of song variability as a function of social context should be performed blind to that context.

To select songs for analysis, we manually sorted all song files recorded from 8 a.m. to 12 p.m. and eliminated files representing cage noise. For each pupil, we selected 20 song files (10 for each tutor), approximately evenly spread across the entire set of remaining song files (e.g., if there were 200 song files, the first, 11th, 21th, 31th, … were selected). If a song file contained more than one motif rendition, we analyzed the one in the middle of the file, which is in general more complete. We have added a brief description of song selection to the Materials and methods (subsection “Song behavior and analysis”). For assigning juveniles for viral injection, we removed the word “random” and revised the expression to stipulate that each animal had an equal probability for being injected with the control or miR-9 virus (subsection 2 Stereotaxic injection”, first paragraph).

The motif similarity data were analyzed by two investigators: Zhimin Shi, who did most of the experiments described in this manuscript, thus was aware of the treatment groups. A student blind to the treatment groups did similar analysis using a different set of song files. The Inter-rater reliability between their data is 0.89 (subsection “Motif similarity and syllable accuracy analysis”).

Another lab member (who was blind to the treatment groups) analyzed cFF of UDS and DS during this revision. The new results are now shown in Figure 5B. The inter-rater reliability between the new and previous results is 0.71 (subsection “Constant fundamental frequency analysis”).

1.3) Experimental design: Were control and experimental pupils given the same tutor in order to control for tutor song complexity? This is important since, e.g. giving a simple tutor song to one group and a complex song to the other would skew the results regardless of molecular interventions. It appears that they were, as illustrated in Figure 2B where each type of pupil's song is compared to the same tutor song. This should be articulated.

Because tutoring is a one-on-one lengthy process (~ 50 days), we used multiple tutors in the experiment, only a subset of pupils shared tutors (2 sets of triplets). For example, Grey180 (control pupil), Yellow23 (miR-9 pupil), and Grey139 (miR-9 pupil) shared a tutor. Their respective similarity scores were: Grey 180 = 83.8, Yellow23 = 68.4, and Grey139= 51.3; Yellow16 (control pupil), Yelow29 (control pupil) and Grey146 (miR-9 pupil) shared a tutor. Their individual scores were: Yellow16 = 87.6, Yellow29 = 82.5, and Grey146= 49.5. Although sample sizes are small, these data suggest that given the same tutor, control pupils learn better than miR-9 pupils. We were blind to the complexity of tutor songs when assigning tutors to pupils. If we use the number of syllables as an approximate indication of song complexity, the average numbers of tutor song syllables for the control and miR-9 groups are 6.2 and 5.7 respectively. Thus, it is unlikely that the apparent learning deficits of miR-9 pupils were a consequence of control pupils learning simpler tutor songs.

1.4) Behavioral regulation of gene expression – A key consideration is the behavioral state of the animal at sacrifice. Prior work from this group, and from those of White and Scharff indicate that miR-9 and FoxP2 are behaviorally regulated, yet no mention is made of what the behavioral state of the animal was at sacrifice, nor of the time of day of sacrifice. As this could strongly influence gene expression data, this is a key issue to be explicit about.

The behavioral state of the animals at sacrifice for gene expression experiments (both RNA and protein) was carefully controlled. All animals were observed in the morning for one hour (8 – 9 a.m.) before being sacrificed, and no animal sang during this time. This information is now provided in the Materials and methods subsection “Gene expression analysis.

1.5) Blocked learning versus developmental delay – Are the miR-9 pupils just developmentally delayed and could they eventually learn to produce a good copy of the tutor song? Figure 6 shows that many of these features improve with age in both miR-9 pupils and control pupils, so the poor imitation may merely reflect a developmental delay. Alternatively, poor imitation at 100d might reflect less practice (i.e., do the miR-9 pupils sing just as much as control pupils?). The authors could address this issue by quantifying the amount of singing in the two groups at the three different ages (65d, 85d, and 100d). In addition, they could analyze any songs of birds beyond 100d.

In the same vein, are the songs of miR-9 pupils crystallized? By 100d, are these birds singing a stereotyped sequence of song elements or are the songs of these birds highly variable from one song bout to the next? Previous studies have shown that lesions of Area X in juvenile birds prevent song crystallization, resulting in poor imitation (Introduction, second paragraph). In contrast, lesions of the downstream nucleus LMAN in juvenile birds results in premature crystallization and poor imitation (i.e., they only copy a few syllables of the tutor song). In both groups, the lesioned birds failed to produce an accurate copy of the tutor song but in completely different ways.

We have now included analysis of motif similarity of 150d songs. These new data (subsection “The developmental process of vocal learning and performance”, first paragraph and new Figure 6A) show that at 150d, the similarity score of miR-9 pupils was significantly lower than that control pupils (p = 0.008), and miR-9 pupils did not significantly improve their similarity score from 65d to 150d (p = 0.137). Figure 6C-E show that variations in acoustic features improved with age, but features of miR-9 pupils were more variable than those of control pupils for all ages.

We also quantified the amount of singing at 65d and 100d (subsection “The developmental process of vocal learning and performance”, last paragraph and Figure 6—figure supplement 1). We found that miR-9 pupils sang slightly more than control pupils at the two ages, but the differences were not significant (p = 0.419 for 65d and p = 0.109 for 100d groups). Thus, it is unlikely that the amount of singing is a determinative factor in impairments in song imitation in miR-9 pupils.

To address whether miR-9 pupils sing a stereotyped sequence of song syllables, we analyzed syllable transition entropy (also see our reply to point 1.6 below). We found that songs of miR-9 pupils have a higher syllable transition entropy (more variable syllable sequence) than controls (subsection “miR-9 overexpression in juvenile Area X impairs song performance and abolishes 158 social context-dependent modulation of song variability in adulthood”, first paragraph and new Figure 4B). As pointed out by reviewers, previous lesion studies show that lesions in lMAN or Area X, two nuclei in the anterior forebrain pathway, have different consequences on song learning. While lesions in lMAN lead to prematurely stabilized song, birds with Area X lesions never achieve a fully stable song as control birds do (Scharff and Nottebohm, 1991). Although the experimental approaches used are very different (gene manipulation vs. electrolytic lesion), our observation is more aligned with the effects of lesions in Area X, suggesting that the effect of miR-9 overexpression (molecular manipulation) might be restrained by circuit functions, in this case functions of Area X (see Discussion, second paragraph for a brief discussion).

1.6) Interpretation – miR-9 overexpression from ~50d-100d could disrupt the ability of juvenile birds to 1) generate variable motor sequences; or 2) to select those motor programs that result in a good match to the tutor song model. Additional behavioral analyses would distinguish between these two possibilities:a) Quantify sequence consistency/linearity or transition entropy at different developmental stages. Does the variability in the sequence change over time?

We quantified syllable transition entropy for 100d songs. Indeed, syllable transition entropy of miR-9 pupils was higher than that of control pupils, which can be due to switching of syllable order, truncations of motifs and/or stuttering in miR-9 pupils (subsection “miR-9 overexpression in juvenile Area X impairs song performance and abolishes social context-dependent modulation of song variability in adulthood”, first paragraph, new Figure 4A-C). In this new analysis, we included 10,000-19,000 auto-segmented syllables (recorded in the 8:00 a.m. to 12:00 p.m. period on two consecutive days) and used the clustering function of SAP to classify syllable types (see the Materials and methods subsection “Acoustic feature analysis”). Previously, we manually counted syllable sequence changes in 50-70 motif renditions and reported syllable sequence change in miR-9 pupils (old Table 1, now removed from the current manuscript). Although different methods were used, our new analysis and previous analysis point in the same direction: syllable sequence of miR-9 pupils is more variable.

However, we did not quantify syllable transition entropy at a younger age. Our preliminary analysis in younger birds showed that syllable classification errors during stages where motif units are not yet fully established would be high, but also difficult to detect. In other words, we suspect that in younger birds it would be too difficult to distinguish between misclassification due to spectral instability (which is higher in the miR-9 group) and cases of real variability in transitions across syllable types. Since syllables were less well defined or stable at a younger age, SAP often mistakenly assigned syllable/cluster IDs that could mislead the results by producing false high entropy scores.

b) Instead of measuring the average similarity between the pupil's song and the tutor's song (Figure 2), quantify the maximum similarity score to determine whether miR-9 pupils are capable of producing good copies of the tutor song.

We quantified the maximum motif similarity between the pupil’s song and the tutor’s song (subsection “miR-9 overexpression in juvenile Area X impairs vocal learning”, second paragraph and Figure 3—figure supplement 1). In similarity analysis, we compared 20 pupil renditions with 10 tutor renditions to generate 200 pairwise measurements. We ranked the 200 measurements, and averaged the 10 highest scores (top 5%) to represent the base score for each pupil. The maximum similarity of miR-9 pupils was significantly lower than that of control pupils (71% vs. 92%, p < 0.001), suggesting that at their best performance, miR-9 pupils were not able to produce good copies of the tutor song.

c) Instead of measuring the difference in a particular acoustic feature between the pupil and student (Figure 3), quantify the "accuracy" of each syllable. Are miR-9 pupils able to produce accurate copies of the tutor song syllables? Or are the syllables more variable and also less accurate?

We quantified syllable accuracy scores of control and miR-9 pupils, and found that miR9 pupils imitated their tutors’ syllables less accurately than did control pupils (subsection “miR-9 overexpression in juvenile Area X impairs vocal learning”, last paragraph, new Figure 3B, and Materials and methods, subsection “Motif similarity and syllable accuracy analysis”).

1.7) Data presentation – this relates primarily to Figure 3B and 3C. The authors should clarify the y-axis in Figure 3B. Is the absolute value of the difference between pupil and tutor plotted? Are all of the values really positive? Were any of values lower for the pupil's syllable versus the tutor's syllable? Figure 3C plots measures of particular features of a pupil's syllable and the corresponding tutor's syllables, but the number of syllables is not the same across groups (miR-9 v. control pupils) because control birds learned more syllables than miR-9 pupils. However, this makes direct comparison across groups difficult. It would help to indicate the shared syllables across the two groups to allow direct comparison. When comparing Figure 3B and Figure 3C, it is not clear how there are significant differences in the goodness of pitch between control and miR-9 pupils when the values for all the syllables are completely overlapping. Again, here, it would help if one could evaluate the values for just the syllables that are matched across the groups.

Old Figure 3 was confusing. The absolute values of the differences between pupil and tutor were plotted in old Figure 3B. To be more clear, we have changed calculation methods such that the y-axis now reflects the direction of change (New Figure 3D). The new results are slightly different from previous results because some positive and negative values canceled each other. In this new analysis, goodness of pitch is no longer significantly different between control and miR-9 pupils (p = 0.16). Thus, the PG panel together with the duration (not significant) panel have been removed. Because only a subset of pupils shared tutors, if we compared only matched syllables, many syllables would be excluded from the analysis. Thus, we measured features of all syllable types produced by a bird and averaged them to present that bird (see Materials and methods subsection “Song behavior and analysis”).

1.8) Fundamental frequency analyses – Figure 5A – the fundamental frequency seems to be changing in the segment of the representative syllable (indicated by red and blue lines) that is plotted and analyzed. In this example, the frequency changes from a flat, constant frequency to a downsweep, raising some concern about the analysis. It would be useful to know that the inclusion of the last part of the syllable did not affect the measurement of the variance in the cFF across trials. While the segments used for control versus miR-9 pupils are probably the same across birds, variability in syllable duration was different across the two groups of pupils, and could potentially affect the measurement of cFF. The authors should confirm their measurements of cFF were restricted to constant frequency regions of harmonic stacks.

We performed fundamental frequency analysis again by a person blind to the treatment groups. Our measurement was strictly restricted to the constant fundamental frequency part of harmonic stacks (not including the downsweep portion). The new results (New Figure 5B) are consistent with the previous results (inter-observer reliability = 0.71). In our previous analysis, most measurements were restricted to the flat regions, but sometimes, the downsweep part may have been included. To reflect our new analysis, we replotted the blue line position in new Figure 5A.

2) Molecular analyses:

2.1) Developmental versus experimental effects of miR-9 expression – the premise of overexpressing miR-9 makes sense in terms of trying to reduce FoxP2/FoxP1 expression. However, the authors have previously shown that miR-9 expression is developmentally regulated (upregulated). The authors should discuss the discrepancy between their previous finding that miR-9 levels increase from 45d-100d as song matures and becomes more similar to the tutor song (Shi et al., 2013) and the current finding that artificial overexpression of miR-9 in Area X results in greater variability in song (less mature) and lower similarity to the tutor song.

We have previously shown that miR-9 expression in Area X increases from 45d to 100d during normal song development (Shi et al., 2013). In the present study, overexpression of miR-9 in juvenile Area X results in a song that is more variable and shows less similarity to the tutor song. A similar seemingly paradoxical observation was made for FoxP2: while in normal birds FoxP2 expression in Area X is higher in juveniles than in adults, knocking down FoxP2 in juveniles does not accelerate maturation, but results in a more variable song with lower similarity (Haelers et al., 2007; Murugan et al., 2013). It seems that miR9 overexpression (or FoxP2 knocking down) in juveniles impairs the progression of song maturation. A recent study shows that overexpression of FoxP2 in juvenile Area X caused inaccurate song imitation, raising the possibility that the dynamic regulation of behavioral-driven gene expression (i.e., FoxP2 or miR-9) plays a critical role in vocal development (Heston and White, 2015). Taken together, none of these gene manipulations (manipulating FoxP2 or miR-9) enhances song learning and maturation, an effect consistent with the earlier observation that lesions in juvenile Area X result in variable songs (Scharff and Nottebohm, 1991). Normal vocal development requires proper functioning of the entire circuit, which involves coordination among multiple song nuclei (e.g., HVC, lMAN, etc.). The effect of miR-9 overexpression (or FoxP2 knockdown) might be restrained by circuit functions, in this case, functions of Area X. Thus, changing gene expression in a manner opposite to their normal developmental pattern in Area X alone would not be sufficient to facilitate vocal learning (see Discussion, second paragraph).

2.2) Cellular phenotypes affected by viral manipulation versus normal phenotypes that express miR-9 – is miR-9 expressed only in medium spiny neurons (MSNs) or is it expressed in other interneurons or the two types of pallidal neurons in Area X? The expression pattern of miR-9 in Area X is significant as it would suggest/constrain mechanisms by which miR-9 alters Area X function (e.g., striatal v. pallidal cell activity), and the authors should indicate which cell types express miR-9 (or cite a previous study).

miR-9 is one of the most highly expressed miRNAs in vertebrate brains. By in situ hybridization and counter-staining with DAPI, we previously observed that miR-9 is expressed in almost all cells in normal Area X (Shi et al., 2013). The viral vector we used in the present study contains a ubiquitin promoter. Presumably, it could express miR-9 in most cell types in Area X. Area X contains multiple neuron types, including the predominant spiny neurons, pallidal-like neurons, and interneurons. Unlike mammals, where GPi and GPe neurons are segregated, in zebra finches, the pallidal neurons are intermingled with other neuron types in Area X. While GPi-like neurons project to the thalamic nucleus DLM, GPe-like neurons are less characterized.

FoxP2 is expressed in the spiny neurons (Haeslus et al., 2004); and given that FoxP1 is abundantly expressed in Area X, presumably it is expressed in spiny neurons as well (Teramitsu et al., 2004). It is not clear whether FoxP1 or FoxP2 is expressed in pallidal neurons in Area X, although in human brains, FOXP2 is detected in the GPi region (Teramitsu et al., 2004).DARPP-32, which functions downstream of dopamine receptors, is a well-established spiny neuron marker (Greengard et al., 1999). In zebra finches, dopamine signaling is found in both spiny and pallidal neurons (Leblois et al., 2010).Taken together, it is possible that the behavioral phenotype we observed resulted from a combination of miR-9 effects on spiny and pallidal neurons, but we cannot rule out possible contributions of other neuron types.

We agree with the reviewer that to fully understand functions of miR-9 in Area X, ultimately one needs to address its function in each specific neuron type. It will be important, ultimately, to use cell type-specific promoters to express miR-9 specifically and separately in spiny or pallidal neurons or interneurons to investigate effects of miR-9 on each neuron type.

2.3) Morphological effects – in addition, it would be useful to know whether there were gross changes in the structure of striatal MSNs or other neurons in miR-9 pupils as previous studies have shown that disrupted FOXP2 expression can affect spine density in MSNs in Area X.

Reducing FoxP2 in Area X affects spine density in spiny neurons in Area X (Schulz et al., 2010), and some of the genes we examined function in neurite growth and dendrite structural plasticity (Figure 8). Thus, it is likely that miR-9 overexpression affects dendrite and spine structure/density in spiny neurons in Area X. We agree with the reviewer, and also are eager to elucidate the morphological effects of miR-9. However, a systematic analysis of miR-9 effects on dendrite/spine structure of spiny and/or other neuron types would take considerable effort, and is beyond the scope of the current study. We anticipate investigating this issue in future studies.

2.4) Molecular targets – the focus on FoxP2/FoxP1 makes sense with previous work, but miR-9 has many other targets. How do the authors know that the "downstream" gene expression identified is really due to the change in FoxP2/FoxP1 expression? More detail would be helpful regarding the selected genes for analysis. Of course, it would have been possible to perform RNAseq to identify more genes that were differentially regulated between experimental groups. Moreover, beyond absolute gene expression levels, gene co-expression patterns reveal biologically relevant information. Those approaches may provide greater insight but do not negate the importance of the findings presented here. Rather, more detail regarding putative miR-9 binding sites in the selected genes, or whether the effects are thought to be mediated indirectly via FoxP2 would aid the interpretation of the results.

We selected FoxP1/FoxP2 downstream target genes based on published human and mouse studies, and we also gave preference to genes that have known neural functions and/or have been implicated in neural developmental/mental disorders. FOXP1/FOXP2 are known to bidirectionally regulate gene expression. Downregulation of FoxP1/FoxP2 by miR-9 could relieve repression, thus increasing gene expression. That 15 of the 26 genes we tested increased expression (Figure 8) is consistent with their being downstream from and being repressed by FOXP1/P2, and suggests that they are unlikely to be direct targets of miR-9. Other regulatory possibilities exist. For example, some downregulated genes potentially can be direct targets of miR-9, or are both downstream of FoxP1/P2 and targets of miR-9; some may be targets of other transcription factors that are regulated by FoxP1/FoxP2 or miR-9. It is also possible that one gene can be regulated by a combination of multiple mechanisms. Much work is needed to fully understand how each individual gene is regulated in this defined neural circuit.

miR-9 targeting sites in the 3’UTRs of FOXP1/FOXP2 are highly conserved between mammals and birds. But this may not be the case for other genes, since 3’UTR sequences are less conserved in general. The current databases of miRNA targeting sites (e.g., miRNA.org) do not include zebra finch 3’UTR sequences. The current zebra finch genome has numerous gaps, and many of the 3’UTR regions are not well annotated (e.g., we had to clone and sequence the FoxP2 3’UTR ourselves), making analysis of miR-9 target sites (and all other target sites) in the 3’UTRs of many zebra finch genes difficult.

RNA-seq is a powerful tool for unbiased analysis of a large number of genes. This approach may be explored in the future. However, RNA-seq is good for detecting robust differential gene expression. Gene expression changes we tested are moderate, ranging from 20-50%. We suspect we could have easily missed changes of this magnitude by using RNA-seq.

2.5) Protein analyses – Western blots should not be cropped and MW markers should be included. Given the many isoforms of FoxP1/2, known post-translational modifications, and antibody issues, this is important. Especially since multiple bands are sometimes observed as in Figure 7. Accordingly, 'For Western blots, bands with expected sizes were obtained' is overly vague and does not address the double bands present in Figure 7B and 7D for FoxP2 and Darpp32 respectively. Please clarify.

We replaced images in Figure 7B/D with the original western blot images that contained samples of all four birds. We detected two FoxP2-related bands near 75kd from Area X. Two FoxP2-related bands with similar molecular weight also were detected from Area X by other groups using different anti-FOXP2 antibodies (Miller et al., 2008; Murugan et al., 2013; Heston and White, 2015). It is likely that the two bands represent isoforms or the post-translationally-modified FoxP2 proteins. We quantified the major band (the higher molecular weight band). We detected two DARPP-32-related bands near 32-35 kD from Area X. DARPP-32 has multiple phosphorylation sites. It is likely that the two bands represent posttranslational modified DARPP-32 proteins. We quantified the major band (the higher molecular weight band).

We also performed peptide competition experiments for FoxP2 and DARPP-32 using proteins extracted from Area X. We show that (Figure 7—figure supplement 1) preincubating primary antibodies with respective competing peptides (10-fold molar excess) inhibited the detection of the FoxP2- and DARPP-32-related bands.

3) Statistics – subsection “miR-9 overexpression in Area X impairs adult song performance and abolishes social context-dependent modulation of song variability”, last paragraph: Different statistical tests were used for the same type of data; for 100d data, a paired t-test was used, but for 65d and 80d data of similar numbers, the non-parametric Mann-Whitney U test was used. The same tests should be used for the data at different ages or the authors should explain why different statistical tests were used. Given that normal distribution of the data is unlikely to be confirmed with these sample sizes, nonparametric tests and, ideally, bootstrapping or resampling methods are in order. The latter make no assumptions about the distribution of the data.

Although Figure 5B and 5C both address cFF, data are presented differently. In Figure 5B, cFF values of the same syllable in UDS and DS contexts were compared (the two dots connected by a line) for the 100d pupils. Thus, we used paired t-test for statistics. In Figure 5C, the ratio of coefficients of variation in cFF of UDS over coefficients of variation in cFF of DS was compared between control and miR-9 pupils at 65d, 80d, and 100d (Figure 5 legend). We used Mann-Whitney test to compare data at all three ages. Consequently, the 100d data were evaluated twice: by paired t-test (Figure 5B) at the single time point, and by Mann-Whitney U test (Figure 5C) when included in multiple time points.

For correspondence

Competing interests

Funding

National Institute of Mental Health (R01MH105519)

National Science Foundation (1258015)

XiaoChing Li

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Dina Lipkind for advice on setting up recording chambers, Ashli Weber for helping sorting recorded song files, and Ellie Guillot for the art work in the figures. We also thank many members of the birdsong community for their constructive inputs through the course of this work.

Ethics

Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocol (#3187) of the LSU School of Medicine.

Reviewing Editor

Stephanie A White, Reviewing Editor, University of California, Los Angeles, United States

eLife is a non-profit organisation inspired by research funders and led by scientists. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science.eLife Sciences Publications, Ltd is a limited liability non-profit non-stock corporation incorporated in the State of Delaware, USA, with company number 5030732, and is registered in the UK with company number FC030576 and branch number BR015634 at the address:
eLife Sciences Publications, Ltd
1st Floor, 24 Hills Road
Cambridge CB2 1JP
UK