We use cookies to enhance your experience on our website. By continuing to use our website, you are agreeing to our use of cookies. You can change your cookie settings at any time.Find out moreJump to
Content

Auditory Systems of Drosophila and Other Invertebrates

Abstract and Keywords

Hearing in invertebrates has evolved independently as an adaptation to avoid predators or to mediate intraspecific communication. Although many invertebrate groups are able to respond to sound stimuli, insects are the only group in which hearing is widely used. Therefore, we will focus here on the auditory systems of some well-known insect models. Appearance of the ability to perceive sound in insects is presumably a quite recent event in evolution. As a result of independent evolution, diverse types of hearing organs are evolved in insects. Here we will introduce basic features of insect ears and the mechanisms through which sound stimuli are converted into neuronal electric signals. We will also summarize our current understanding of neural processing of auditory information, including tonotopy, sound localization, and pattern recognition.

Sound is a mechanical vibration that propagates as a wave of pressure and particle displacement through an elastic medium such as air or water. Hearing organs detect and convert such vibrations into neuronal signals. Various types of mechanoreceptors, which respond to certain mechanical vibrations, are present in animals. However, as the ears we will consider only the sensory organs specialized for converting airborne sound waves into electrical signals, which are subsequently sent to the central nervous system (CNS) for information processing.

Insects live in almost every terrestrial habit so that each group was faced with unique challenges that drive evolution of hearing. As a result, insect ears show tremendous diversity in their location and morphology (Table 1). Despite such diversity, insect ears can be categorized into two classes by the sound-receiving structures and physical properties of sound they detect: the velocity receiver and the pressure receiver.

(*)
The velocity receptors. All the others are the pressure receptors.

The Design of Ears

When a sound is produced, air particles near a source of disturbance are displaced back and forth, transmitting the disturbance to neighboring air particles. In many cases the disturbance propagates as a wave of pressure. Such form of sound is called a far-field sound because the amplitude of pressure decreases slowly so that it can propagate for a long distance. But under certain conditions, the displacement of air particles cannot induce pressure change; instead, a given volume of air streams around the source of disturbance. This type of sound is called a near-field sound. As the ratio of the wavelength to the size of sound-producing structure should be sufficiently small to radiate sound pressure, air disturbances with relatively low-frequency oscillations can produce the near-field sounds. Therefore, the frequency of near-field sounds is relatively low (usually below 1 kHz).

The velocity receivers detect the near-field sound by responding to the particle velocity component of sound. Because it is sensitive to the flow of air particles, the shape of sound-receiving structures resembles hair or a twig branch. The frequency range they can detect is usually sub-kilohertz. Another feature common to the velocity receivers is that they are inherently directional because particle velocity is vector quantity. The auditory filiform hair is an example of a velocity receiver. Like other insect sensory bristles, each filiform hair is composed of a ciliated sensory neuron with three supporting cells (Keil, 1997). The filiform hairs located on the dorsal thorax of the caterpillar detect the near-field sounds produced by wingbeats of a flying wasp up to a distance of 70 cm (Tautz, 1977, 1978). The dominant frequency of flight-generated vibration is about 150 Hz, which is within the optimal frequency range (100–400 Hz) the hairs have adapted to detect the particle displacement (Tautz, 1978). Similar velocity-detecting filiform hairs are also found in the cerci of some Orthoptera, including crickets, and cockroaches (see Römer & Tautz, 1992, for review). In cockroaches, the hairs elicit the escape behavior against wind puff generated by the tongue strike of a toad (Camhi & Tom, 1978). The cercal filiform hairs in crickets are known to be involved in intraspecific communication (Kämper, 1984). Most crickets have another type of hearing organs called tympanal ears, which detect sound pressure (see later for more details). A typical male cricket calling song is composed of repeated chirps; and each chirp is composed of 3–5 high-frequency pulses (~5 kHz) with fixed intervals. The pulse rate in a chirp varies species by species, but it is usually very low (~30 Hz). The filiform hairs detect this low-frequency near-field component of calling song, while the tympana detect the high-frequency sound pressure originating from the high-frequency pulses (Kämper, 1984).

One of the most intensively studied velocity receivers is the antennal ear found in many insects, including mosquitoes, flies, and honeybees (Nadrowski et al., 2011). In honeybees, the workers not only perform the typical wagging dance but also produce the near-field sound with their wings to communicate the information about food source to their nestmates, which is detected by their antennal ears (Michelsen et al., 1987; Dreller & Kirchner, 1993). Many flies and mosquitoes depend on their antennal ears to detect the conspecific near-field courtship songs (Bennet-Clark & Ewing, 1969; Gopfert et al., 1999). The anatomy and physiology of the antennal ear are intensively studied in the fruit fly Drosophila melanogaster model; more details will be described later.

The other class of insect hearing organs termed the “pressure receivers” or the “tympanal ears” are different from the velocity receivers in many ways. The pressure receivers detect the far-field sounds, which propagate as waves of changing pressure in air. Thus, a membranous structure is required to detect pressure; this membranous pressure-detecting structure is called the “eardrum” or “tympanum.” In addition, they can reliably detect much higher frequencies compared to the velocity receivers: Some insects can detect echolocating bat calls whose frequencies are well beyond the upper limit of the human ear. Moreover, the pressure that impinges on the surface of the eardrum is not directional so that central processing of auditory information by comparing the differences in arrival time and level of sound between two ears is required for directional hearing.

The tympanal ears have evolved independently in at least seven different orders of insects (Hoy & Robert, 1996). A typical tympanal ear contains three anatomical structures: the tympanum, an air-filled sac or tracheal expansion, and the auditory chordotonal organs. The tympanum arises through a stepwise thinning of the cuticle, and it senses sound by vibrating in response to the waves of air pressure impinging on its surface. Some tympana are exposed to the outside so that they directly receive sound waves, whereas the others are located within the ear cavity. An air-filled sac typically backs the tympanum so that impedance-matching structures such as the middle ear of the mammalian ear are unnecessary. The chordotonal organ is an internal stretch receptor found in insects and crustaceans (Field & Matheson, 1998). The functional unit of a chordotonal organ termed “scolopidium” contains one to three chordotonal neurons with other supporting cells (see Fig. 1C). Despite their morphological diversity, most insect ears except the filiform hairs use the chordotonal organs as the fundamental sensory unit. The chordotonal neurons transduce the tympanal vibrations into neuronal signals and send them to the CNS for further processing.

In many insects, such as crickets, katydids, and grasshoppers, the closed air sac is replaced with an acoustic tracheal tube, which has an opening at the body wall. Thus, sound pressure can also reach back of the tympanum via the tracheal tube, and consequently the vibration of tympanum is dependent on sound wave phases impinging on both sides of the tympanum. This type of tympanal ears are called the pressure-difference or pressure-gradient receivers, and they are very efficient at locating sound sources (Michelsen & Larsen, 2008). Another type of modified pressure receiver specialized for sound localization, in which two ears are mechanically coupled, is found in parasitoid flies (Robert et al., 1994). More details about these special pressure receivers will be described later.

The rainforest katydid Copiphora gorgonensis has another special type of tympanal ear. Detailed analysis of its anatomy and mechanics showed remarkable convergent auditory mechanisms between mammals and katydids (Montealegre et al., 2012). Unlike other tympanal ears, katydid scolopidia are not in direct contact with the tympanum; instead, they reside in the fluid filled sac called the acoustic vesicle (AV), and consequently, they meet the so-called impedance matching problem due to the difference in the acoustic impedance between air and the body fluid. In terrestrial tetrapods, which have the same problem, the middle ear structures function as the impedance converter. The katydid ears also have an analogous device acting like a type I lever, through which they can amplify the pressure impinging on the surface of the AV about 10 times (Montealegre et al., 2012). Another analogy to the vertebrate ear is the tonotopic organization of frequency response along the covering of the AV. Owing to the simple mechanics, the covering of the AV termed the “dorsal cuticle” also vibrates in response to sound stimuli. Unlike the tympanal membrane, the frequency response along the dorsal cuticle showed a hallmark of tonotopy; low frequencies dominated at the proximal end, whereas high frequencies prevail distally (Montealegre et al., 2012). Similar tonotopic arrangement was also found in another katydid species, Mecopdoda elongata (Palghat Udayashankar et al., 2012).

The Antennal Ear of Drosophila

Like many other Dipterans, Drosophila detects a near-field courtship song by using a pair of antennal ears. The Drosophila antenna consists of four segments. The most peripheral fourth segment (a4) or arista is a feather-like structure, which captures the moving air particles and oscillates back and forth in response to the particle displacement. The base of arista is rigidly stuck to the third segment (a3) so that the a3 rotates along with oscillating arista (Gopfert & Robert, 2001). The a3 stalk grows upward to the second segment (a2) and joins with the a2 at the hook (Fig. 1A). The antennal chordotonal organ, Johnston’s organ (JO), resides in the a2 and contains ~200 scolopidia (Yack, 2004). Each scolopidium is composed of two or three chordotonal neurons with other supporting cells, and it is linked to the hook of the a2/a3 joint. The extracellular matrix termed the “dendrite cap” serves as the link that physically connects the distal dendritic membrane to the hook of the a3 (Chung et al., 2001). Consequently, sound-induced mechanical displacement is directly transmitted to the JO neurons and causes deformation of the cells. As the longitudinal axis of scolopidial array is radially organized in a way they connect perpendicularly to the hook, movement of the hook in a certain direction will stretch some neurons, while the others are compressed. Therefore, JO neuron groups in the opposite sides are alternately activated and inactivated in response to the air particle displacement-induced oscillation of arista (Fig. 1B) (Kamikouchi et al., 2009; Pezier & Blagburn, 2013).

Figure 1Drosophila antennal ear. (A) Schematic drawing of antenna showing the arista, third segment (a3), and second segment (a2), which are physically linked. Distinct subgroups of JO neurons in the a2 send sensory information of either sound or wind/gravity to the brain. (B) Transverse sections of antenna through the dotted line in A. The response of JO neurons to a given direction of antennal movement is dependent on their location. (C) Structure of a scolopidum showing chordotonal neurons and supporting structures (some supporting cells are omitted). The micrograph to the right shows segregation of two TRP channels in distinct ciliary zones. a, anterior; d, dorsal; l, lateral; m, medial; p, posterior; v, ventral.

The axonal outputs of the JO neurons become a part of the antennal nerve and terminate mainly in a pair of neutrophil structures in the brain termed the “antennal mechanosensory and motor center” (AMMC). A small subset of JO axons projects to two other regions, the wedge (WED) and the gnathal ganglia (GNG), which respectively lie dorsally and ventrally to the AMMC (Ishikawa & Kamikouchi, 2015). On the basis of their projecting zone in the brain, JO neurons can be subdivided into five subgroups, A to E (Kamikouchi et al., 2006). Subgroup A neurons feed into the zone A, which is distributed into three brain regions, the lateral side of AMMC, the ventral side of the WED, and the dorsal part of the GNG (Kamikouchi et al., 2006). The other four zones are located within the AMMC. This zonal segregation of antennal input implies a functional segregation of subgroup JO neurons. Indeed, a functional analysis of them showed they can be categorized into two different functional groups (Kamikouchi et al., 2009). The subgroup A and B neurons are sensitive to mechanical oscillations, whereas the subgroup C and E neurons are sensitive to static depression. This implies the antennal ear can detect two different air particle movements; sound as an oscillating air particle displacement, and wind or gravity as a static flow of air particle. Indeed, the subgroup A and B neurons are required for both sound-evoked courtship behavior and antennal potentials, but they are dispensable for gravity sensing. And the opposite is true for the subgroup C and E neurons: They are required for gravity sensing but not for hearing (Kamikouchi et al., 2009; Sun et al., 2009).

Auditory Transduction and Amplification: Lessons From the Drosophila Auditory System

The mechano-electrical auditory transduction is believed to take place in the ciliary dendritic outer segment of the JO neuron. Indeed, many ciliogenesis-defective flies are impaired in the sound-evoked neuronal responses (Kernan, 2007). The ciliary compartment of the chordotonal neuron is structurally and functionally distinct from that of other ciliated sensory neurons or other primary cilia (Lee & Chung, 2015). In the middle of the chordotonal cilium, a prominent swelling termed the “ciliary dilation” (CD) divides the cilium into two distinct subcompartments: proximal and distal zones (Fig. 1C). Moreover, the axonemal microtubule in the proximal zone contains dynein arms, which can be seen only in the motile cilia, suggesting that the chordotonal cilium may be motile (Yack, 2004). In addition to the structural difference, the two subcompartments differ in the localization of two different TRP ion channels (TRPV and TRPN), which have distinct roles in auditory transduction (Gopfert et al., 2006; Kamikouchi et al., 2009; Lehnert et al., 2013). TRPV is localized exclusively in the proximal zone, whereas TRPN is seen only in the distal zone (Lee et al., 2010) (Fig. 1C). This finding provides a clue for understanding the specific role of each subcompartment and TRP channel in auditory transduction and amplification.

Several lines of evidence suggested that the auditory transduction and amplification in the antennal ear of Drosophila share many common features with those in the vertebrate hearing organ. For transduction, the fly’s antennal sound receiver showed nonlinear gating compliance, which is a hallmark of gating-spring-mediated opening of the auditory transduction channel documented in the vertebrate auditory hair cells (Howard & Hudspeth, 1988). Therefore, this finding provides the most compelling evidence for direct gating of auditory transduction channel in Drosophila (Albert et al., 2007; Nadrowski et al., 2008). In addition, active amplification, which enhances the hearing sensitivity, is also documented in the Drosophila ear (Gopfert & Robert, 2003; Göpfert et al., 2005). As mutations that disrupt the structure or function of JO neurons (btv, tillB, nompA, and nompC) abolished the proper control of active amplification, the JO neurons might actively control sound sensitivity by their own motility (Gopfert & Robert, 2003; Göpfert et al., 2005).

Two TRP channels found in the chordotonal cilia—TRPN encoded by nompC, and TRPV composed of two subunits encoded by iav and nan—are good candidates for the mechano-electrical transduction (MET) channel (Walker et al., 2000; Kim et al., 2003; Gong et al., 2004; Gopfert et al., 2006). Two models are currently under debate: the TRPN model and the TRPV model. In the TRPN model, the integral part of the auditory transducer is composed of the TRPN channel, whereas TRPV controls the gain of active amplification. The most compelling evidence to support this model came from the finding that TRPN but not TRPV was required for the active amplification (Gopfert et al., 2006). Whereas TRNPN-deficient flies abolished the intensity-dependent nonlinear amplification as well as self-sustained oscillation which causes mechanical feedback for active amplification, both were rather increased in TRPV-deficient flies. Mutant flies lacking both TRP channels abolished both nonlinear amplification and self-sustained oscillation, indicating TRPN acted upstream of TRPV. This strongly supports the TRPN model, because transducers are essential for amplificatory feedback; thus, loss of transducer activity will inevitably abolish amplification. The segregated localization of TRPN and TRPV in distinct ciliary zones is also consistent in this model (Lee et al., 2010). The TRPN-localized distal ciliary zone is the only place where the ciliary membrane physically links to the extracellular dendritic cap, and thus MET may take place. Moreover, the structure of the TRPV-localized proximal zone is also consistent with the proposed function of TRPV. According to the TRPN model, TRPV negatively controls the gain of mechanical feedback loop. As mentioned earlier, the structure of the proximal zone is similar to motile cilia, making this zone an ideal place where the mechanical feedback loop is operated through the neuron’s own motility. Further support for this model came from the finding that TRPN is essential for the function of sound-sensing JO subgroup (A and B) neurons, whereas it is dispensable for that of wind/gravity-sensing subgroup (C and E) neurons (Kamikouchi et al., 2009; Effertz et al., 2011). In addition to these, TRPN has recently been proved to be a bona fide mechano-electric transduction channel (Gong et al., 2013; Yan et al., 2013). Despite this plentiful evidence, however, the TRPN model encounters some challenging issues.

An alternative model or TRPV model is mainly based on the finding that the sound-evoked antennal potentials were reduced but not completely abolished in TRPN mutant flies, whereas TRPV mutations completely abolished them (Eberl et al., 2000; Kim et al., 2003; Gong et al., 2004). This suggests that the role of TRPN is limited to the activation of mechanical feedback for active amplification. Therefore, TRPV is granted to the role of the MET channel, which triggers spikes in the antennal afferent nerves. More direct evidence for this model was found by intracellular single-cell recordings from the giant fiber neuron (GFN), which receives direct input from the antennal nerve through an electrical synapse so that it can reveal the single-cell activities of JO neurons (Lehnert et al., 2013). The result showed that TRPV but not TRPN was essential to generate the MET currents which trigger action potential firing in response to sound stimulus. Although TRPN facilitated auditory transduction, the TRPN-generated currents were not detectable (Lehnert et al., 2013). The main challenge for the TRPV model is that there is no direct evidence for mechanical gating of this channel yet (Nesterov et al., 2015). More experiments are certainly required to further verify both models.

Processing of Auditory Information

Tonotopy and Frequency Discrimination

In the mammalian ear, sound elicits traveling waves along the basilar membrane; and this mechanical response of the basilar membrane establishes a place code in which different locations of membrane are maximally deflected depending on sound frequencies. Such place-based decomposition of sound frequencies establishes the tonotopic map. Unlike mammalian ears, most insect ears have a limited number of auditory receptor neurons so that fine discrimination of sound components may not be easy (Fonseca et al., 2000). Nonetheless, most insect ears have at least some ability of frequency discrimination. Moreover, as described earlier, some insects—katydid, locust, and cicada—have evolved tympanal ears in which sound frequencies are encoded by a place-based tonotopic map in a way similar to mammalian ears (Windmill et al., 2005; Sueur et al., 2006; Montealegre et al., 2012; Palghat Udayashankar et al., 2012). In bushcricket Tettigonia viridissima, the tonotopic map is maintained in the auditory neuropil where the axon terminals of sensory neurons synapse with the ascending interneurons (Römer, 1983). The antennal ears also show some degree of frequency decomposition. In Drosophila, distinct groups of auditory receptor neurons have different frequency preferences; neurons of subgroup A, B, and D, respectively, prefer low frequencies (19–29 Hz), middle frequencies around 200 Hz, and middle to high frequencies up to 900 Hz (Kamikouchi et al., 2009; Matsuo et al., 2014). The basic ability of frequency discrimination in most insect ears, however, doesn’t necessarily mean that insects rely on fine discrimination of sound frequency to show frequency-tuned behavior. Instead, the insect auditory systems are adapted for a few tasks such as finding mates or hosts, and avoiding predators. Therefore, unlike the mammalian auditory system where tonotopic map is maintained along the auditory pathway up to the auditory cortex, almost all central auditory pathways in insects converge into a few auditory channels at the early stage of sound processing (Hildebrandt, 2014). Such categorical perception of frequency may be a common feature of auditory processing in insects. In crickets, for example, characteristic frequencies of auditory receptor nerve fibers are clustered around three frequency ranges: low frequency (≤5.5 kHz), mid frequency (10–12 kHz), and ultrasound (≥18 kHz) (Imaizumi & Pollack, 1999). Most receptor fibers of all three channels have one or more additional sensitivities to frequencies other than their characteristic frequencies, enabling crickets to detect over a wide range of frequencies. These three auditory channels, however, are centrally collapsed into two categorical labeled lines: attractive (<16 kHz) and repulsive (>16 kHz) lines for conspecific songs and predators, respectively (Wyttenbach et al., 1996). For sharpening the auditory information into categorically labeled line, neural processing by lateral inhibition is widely used (Stumpner, 2002; Hildebrandt et al., 2015).

One of the simplest labeled lines for avoidance behavior has been reported in noctuid moth species, in which the labeled line is established by intensity rather than frequency discriminations (Fullard et al., 2003). The moth’s ear is evolved almost exclusively for bat detection. Therefore, a moth encounters three different auditory conditions depending on the bat’s position; a no-bat condition in which a bat is not present, a far-bat condition in which a bat is emitting a searching call from a distance, and a near-bat condition in which a bat is very close and emitting the attack call. The moths can reliably discriminate these conditions by using only one or two auditory receptor cells. The A1 cell, which is a more sensitive cell compared to the other cell A2 (or the only cell in case there is only one receptor neuron), encodes auditory information about the bat calls by using intensity discrimination. The searching call (far-bat condition) has relatively low intensity compared to the attack call (near-bat condition). The A1 cell responded with different firing patterns depending on the intensity of bat calls, thus evoking different responses: the far-bat response (avoiding the direction of the searching call) and the near-bat response (the last-chance escape-flight maneuvers) (Fullard et al., 2003). Although the function of the A2 cell is unclear in most moths, a plausible model was proposed from a sound-producing moth Cycnia tenera. This moth emits trains of ultrasonic clicks when exposed to sound stimuli that resemble the terminal phase of the bat’s attack call. Instead of escaping, the moth may try to disturb the bat’s ecolocating signals when the bat is too close for the moth to escape. The A2 cell responds to this very high-intensity terminal phase of the attack call, and it may contribute to trigger the disturbing ultrasound just before the last moment of bat attack (Fullard et al., 2003).

Sound Localization

Directional hearing is essential for phonotactic behavior in insects. Unlike antennal ears, which are inherently directional, sound localization is somewhat challenging in tympanal ears. In this type of ears, identification of sound direction requires binaural comparison of intensity (interaural intensity difference, IID) and arrival time (interaural time difference, ITD) of sound. The size of most insects, however, is too small to deliver sufficiently large IIDs or ITDs. IID is caused by sound diffraction, whose amount is determined by the ratio of body size to the wavelength of sound. To get reasonably large IIDs, the ratio (body size/wave length) should exceed a value of 0.1. This indeed happens in a large noctuid moth: The level of ultrasonic sound used by echolocating bats can vary by as large as 40 dB between two ears (Payne et al., 1966). But most insects have relatively small body size compared to the wavelength of sound signals they encounter so that the diffraction effect is minute. Furthermore, the small interaural distance—usually a few millimeters only—causes ITDs in the range of a few microseconds, which are difficult to process reliably. Despite these physical constraints, however, evolutionary adaptations enabled some insects to perform precise sound localization (Mason et al., 2001; Hennig et al., 2004).

A sophisticated pressure-difference receiver in some insects is the principal evolutionary innovation for sound localization. In pressure-difference receivers, sound wave can act on both sides of the tympanum. In crickets and bushcrickets, the anatomical basis for a pressure-difference receiver is the complex tracheal system, which conducts sound wave to the back of the tympanum via an opening in the body wall (Larsen & Michelsen, 1978; Michelsen & Larsen, 1978). As sound pressure acting on the back of the tympanum conducts through the acoustic tracheal tube, various degrees of attenuation, amplification, or phase shift can occur depending on the anatomical structure of the tracheal system. To ensure the pressure-difference receiver works properly, appropriate amplification and phase shift are required. In bushcricket tympanum, the amplification factor of sound pressure acting on the internal side is close to 1 at low frequencies (1–3 kHz) so that the hearing organ can act like an ideal pressure-difference receiver, which detects the phase shift between each side of the tympanum (Michelsen & Larsen, 1978). The same trachea may confer a significant high gain at higher frequencies (>10 kHz), converting the tympanum to a simple pressure receiver (Michelsen & Larsen, 1978). A very similar situation was shown for the grasshopper hearing organ, where each ear behaved like a pressure-difference receiver at 4 kHz, but as a pressure receiver above 10 kHz (Michelsen & Rohrseitz, 1995; Schul et al., 1999).

The tympanal ear of crickets is one of the most complicated pressure-difference receivers. Crickets also have the acoustic trachea, which enables sound to reach the internal side of the tympanum. But the acoustic trachea of the cricket is different from that of the bushcricket in two ways. First, the external opening through which sound wave travels to the back side of the tympanum is spiracular so that it can be opened and closed by the animal. Second, the tracheal tubes from each side are connected by a transverse tube so that the entire acoustic tracheal system has the shape of an “H.” Moreover, the transverse connecting tube includes a double membrane called septum at the midline, which enhances the time delay in the internal sound transmission. Due to these anatomical features, three important sound pressures impinge into the tympanal membrane: the external sound pressure, the internal sound pressure from ipsilateral spiracle, and another internal sound pressure originating from the contralateral spiracle (Larsen & Michelsen, 1978). These three sound components should be summated with the proper amplitude and phase relationship to ensure correct sound localization. Because the internal sound conduction is greatly influenced by sound frequency, proper phase relationships between the sound components are strongly frequency dependent so that directionality in the cricket’s ear is also strongly tuned to a narrow frequency range (Hill & Boyan, 1976; Michelsen & Lohe, 1995; Kostarakos et al., 2009; Schmidt et al., 2011). In the case of Gryllus bimaculatus, one of the most intensively studied cricket species, the best frequency of directional hearing is known to be 4.5 kHz, and maximal values of 8–10 dB IID can be yielded at this frequency (Michelsen & Lohe, 1995; Kostarakos et al., 2009). Because the septum enhances the time delay in the internal sound transmission, it influences the phase relationship between the internal sound components from ipsi- and contralateral spiracles. Hence, disruption of the septum reduces the amount of IIDs from 10 dB to 2 dB (Michelsen & Lohe, 1995).

Another evolutionary innovation for directional hearing is a pair of mechanically coupled eardrums appearing in the parasitoid fly Ormia ochracdea. This small parasitoid fly acoustically locates and attacks its singing host, the field cricket (Cade, 1975). Because female flies must find their hosts as a food source for their offspring, they have remarkably precise song localization ability by using auditory cues alone. In this fly, a pair of tympanal ears is located on the anterior thorax just above the first pair of legs (Robert et al., 1994). The two eardrums are so close together (only 520 μm apart) that they are joined by a cuticular structure that acts as a mechanical bridge for motion coupling (Robert et al., 1994; Miles et al., 1995). Mechanical coupling of eardrums’ motion provides a significant amplification in both IID and ILD; from 1.5 μs to 55 μs in ITD, and from <1 dB to 12 dB in IID for sound sources with behaviorally relevant frequencies (~5 kHz) at 45°–90° azimuth (Robert et al., 1996; Robert, 2001).

Although the sophisticated evolutionary innovations described earlier significantly amplify IIDs or ITDs, or both, the minimum IIDs and ITDs that can reliably evoke directional behavior are very tiny. A tentative behavioral study with grasshopper under a quasi-dichotic stimulus situation showed that the resolution of the grasshopper’s auditory system for IIDs was about 1–2 dB (von Helversen & Rheinlaender, 1988). When freely moving female bushcrickets were stimulated by a dichotic ear stimulation device, they on average turned to the stronger stimulated side starting with a 1 dB difference between two ears (Rheinlaender et al., 2006). Crickets are also hyperacute in directional hearing and phonotactic steering: A study showed that the females precisely walked toward the sound source for angles of sound incidence starting 1°–2° from midline (Schoneich & Hedwig, 2010). For neuronal processing, the intensity differences are represented as the discharge latency differences in the receptor nerve fiber. A pioneering work by Mörchen et al. (1978), in which they measured single-fiber activities from the receptor nerve fibers in locust, showed that moving the sound sources from the ipsi- to contralateral side, the spike latencies were proportionally increased, whereas the number of spikes per stimulus was proportionally decreased, showing a linear positive correlation of spike count but a linear negative correlation of response latency with the intensity of sound stimuli. These relationships were well conserved in the first-order ascending interneurons (Mörchen, 1980). These findings suggest that directional information represented with IIDs at the tympanal membrane is coded as two parameters, spike count and response latency, by the receptor neurons. It is of interest to note that the latency shift was remarkable: It reached about 5 ms with moving a sound source from the ipsi- to contralateral side, which exceeds the physical ITD by a factor of ~100, thus providing further resolving power to discriminate the interaural differences. A cricket study also revealed similar physiological characteristics of auditory receptor fibers (Fig. 2A) (Imaizumi & Pollack, 2001). On the basis of these findings, the whole peripheral processing for directional auditory information in insects with pressure-difference receivers can be simplified as follows: (1) The physical ITD, which is the only directional cue for sound localization in small insects, is converted to IIDs by the pressure-difference receiver; (2) the physiological IID is then converted to the form of spike count and response latency in receptor neurons.

Figure 2 Direction-dependent physiology of auditory receptor nerves. (A) Neural representation of sound intensity in crickets (T. oceanicus). The relationship between sound intensity and spike firing rate (top) or response latency (bottom) derived from responses of a single auditory receptor fiber with characteristic frequency of 5 kHz (Reproduced from the Journal of the Acoustical Society of America. From Imaizumi & Pollack, 2001, with the permission of The Acoustical Society of America). (B and C) Neural representation of sound intensity in parasitoid fly (O. ochracdea). Latency shift of spikes elicited by stimuli with different intensities is evident in B. The dashed line in B represents a 10 dB IID showing the latency of ipsilateral afferent to 75 dB stimulus is equivalent to that of contralateral afferent to 85 dB stimulus. The cumulative histogram of the proportion of afferents above threshold in response to sound stimulus (5 kHz) with various intensities is shown in C. The range fractionation is evident.

In parasitoid fly O. ochracdea, the minimum ITD required for directional hearing is astonishing. When flies were tested under open-loop conditions on a track ball, they were not only able to discriminate sound angles as small as 2° from midline but also could make precise turning movement toward the sound source (Mason et al., 2001). As the distance between the fly’s two eardrums (~520 μm) yields only 1.5 μs physical ITD, a 2° angle of sound incidence represents an ITD cue at the tympanal membranes of only 50 ns. Although the physical ITD of 50 ns can be amplified by more than 30-fold by mechanical coupling of the two eardrums, the time difference (now ~1.7 μs) may still be too small to be reliably discriminated. To solve this problem, the receptor nerve fibers of O. ochracdea discharge uniquely; most receptors respond to a sound pulse with only a single spike independent of sound level or duration, which is very unusual compared to a typical rate-level response (Mason et al., 2001). The response parameter that encodes stimulus intensity is spike latency: The mean latency for a given receptor proportionally decreases as stimulus intensity increases. A systematic physiological analysis from auditory afferent nerves showed that the occurrence of spike was strictly time-locked to the onset of stimulus: The latencies of spike firing in a given cell in response to the same stimuli were highly consistent within a very narrow time window (Oshinsky & Hoy, 2002). These physiological features of auditory afferent nerves indicate that the direction of sound source can be coded by response latency difference between nerves from each side (Fig. 2B). Considering that the flies have hyperacute directional hearing ability, however, response latency alone is not enough to encode directional information. The latency shift in response to the sound source shift from ipsilateral to contralateral side (from 90° to –90° azimuth) was measured to be 0.5–1 ms depending on the stimulus intensity, which is much larger than physical ITD (1.5 μs) (Oshinsky & Hoy, 2002). But for the smallest sound angles reliably discriminated by the flies (2°), the latency shift is merely ~7 μs, which is much smaller than the variation in the response latency for individual nerve fiber (~70 μs) (Mason et al., 2001; Oshinsky & Hoy, 2002). Hence, timing accuracy of individual afferent nerve is much lower compared to the behavioral accuracy. To solve this problem, flies use a population coding scheme termed “range fractionation” (Fig. 2C). When the responding thresholds of 5 kHz-tuned afferent nerves were analyzed, they were distributed through a wide dynamic range (Oshinsky & Hoy, 2002). Hence, the proportion of actively firing afferents will proportionally increase as stimulus intensity increases. In other words, the number of active afferents in the ipsilateral side will be greater than that of the contralateral side, and the more ipsilateral afferents will fire as the sound angle increases toward the ipsilateral side. Indeed, a10 dB increase (from 75 dB to 85 dB) in sound level, which is close to the maximum IID in O. ochracdea, increased the proportion of actively firing afferents from 38% to 90% (Oshinsky & Hoy, 2002). In summary, the parasitoid fly O. ochracdea uses both temporal and population coding schemes for sound localization. The flies will steer toward the side where more afferents are more rapidly firing compared to those of the other side. As the differences increase, the flies steer more laterally.

Pattern Recognition

In many insects, discrimination of conspecific songs in the sound-rich environment is essential for a successful mating. For example, a calling song of male cricket (G. bimaculatus) consists of chirps with stereotyped sequences of 3–5 pure tone pules. To show phonotatic behavior toward the calling male, females must recognize the pattern or envelope of the conspecific song. Indeed, female crickets are selectively tuned to the temporal features of the calling song (Thorson et al., 1982; Doherty, 1985). The analysis of temporal pulse pattern requires processing by the central auditory circuit. As the neural capacity is limited, crickets have to rely on pattern recognition instead of spectral analysis of the conspecific song. To explain the principle of neural processing mechanisms underlying temporal pattern recognition, three models have been proposed: template matching, bandpass filtering, and delay-line coincidence detection (Hoy, 1978; Schildberger, 1984; Weber & Thorson, 1989; Hennig, 2003; Bush & Schul, 2006).

In the template matching or cross-correlation model, pattern recognition is achieved by comparing the incoming acoustic signal with an internal template, which is documented in bat echolocation (Simmons et al., 1990). The finding that aggressive female crickets, although they are mute, also made wing movement corresponding to singing males suggested that the internal efferent copy, or corollary discharge, generated by the singing motor system could be an internal template (Alexander, 1962; Huber, 1962). A study that showed genetic coupling of song pattern generation and recognition also appeared to support this idea (Hoy et al., 1977). In the study, hybrid crickets were generated by crossing two different cricket species. F1 hybrid males made a unique calling song that is different from both of their parents as well as F1 hybrid of reciprocal cross. When female hybrid crickets were subjected to phonotactic choice between sibling and reciprocal sibling calling songs, they preferred the calls of sibling hybrids. On the basis of these findings, Hoy (1978) proposed a model for pattern recognition, in which the temporal pattern of an acoustic signal is compared with the internal template derived from the singing central pattern generator (CPG), which might be active at a low level even in the mute female crickets. The finding that a single corollary discharge interneuron is responsible for the pre- and postsynaptic inhibition of auditory interneurons in singing male crickets provided a link between the singing network and auditory pathway at a neuronal level (Poulet & Hedwig, 2006). Except for these findings, however, so far there is no direct evidence that supports the internal template model. Moreover, a study that tested the mechanism of template matching in oceanic field cricket (T. oceanicus) revealed that females showed positive phonotaxis to unusual song patterns in which the specific temporal order of pulse intervals was shuffled and randomized, which contradicts a pattern recognition mechanism based on cross-correlation (Pollack & Hoy, 1979).

Based on systematic study of brain neurons in cricket, an alternative model for pattern recognition was proposed by Schildberger (1984). In the bandpass filtering model, specific low-pass and high-pass neurons cooperate to shape the activity of bandpass neurons that evoke the temporal bandpass tuning of female phonotaxis (the pulse rate that can elicit phonotaxis is within a narrow window). In crickets, the auditory afferents originating from the receptor neurons located in their front legs terminate in the prothoracic ganglion. At each side of the CNS, the auditory information is carried forward to the brain by a single ascending neuron called AN1. Schildberger analyzed the responses of AN1 and local brain neurons to sound stimuli with a varied pulse period. The result showed that AN1 reliably copied the sound pattern, showing temporal filtering occurs at the brain level. In the anterior protocerebrum where the ascending AN1 interneuron sends its axon terminal, four types of brain neurons were identified based on their response properties. BNC1a showed unselective response like AN1. But the other three cells showed unique selective responses. BNC1d showed stronger response to the stimuli with lower pulse rate, whereas BNC2b preferred higher pulse rate, thus acting like a low-pass and a high-pass filter, respectively. Another cell termed “BNC2a” showed bandpass characteristics with maximal responses at the pulse rate of the calling song. On the basis of these findings, Schildberger concluded that the combined activity of BNC1d and BNC2b shaped the bandpass response of BNC2a, which may represent the underlying mechanism of pattern recognition for the phonotactic behavior (Schildberger, 1984). Although this model is appealing, many open questions remain to be solved: (1) There is no anatomical evidence that BNC1d synapse with the bandpass neuron BNC2a, (2) BNC1d is not tuned to the male calling song, and (3) most important, BNC1d fires spikes only to the first pulse of chirps, whereas BNC2a fires over the entire duration of the calling song.

An alternative model proposed by Weber and Thorson (1989) suggests that pattern recognition is processed by the autocorrelation mediated by a delay-line and coincidence detector in the CNS. According to the autocorrelation scheme, if the pulse interval of sound stimulus matches with the internal delay, the delayed response will coincide with the response from the direct pathway evoked by the next pulse, and it will be spatially summated in the coincidence-detector neuron showing a boosted response corresponding to the second pulse (Fig. 3A). Berthold Hedwig’s group at Cambridge University’s Department of Zoology recently identified an elegant brain circuit consisting of just five neurons that strongly supports the concept of autocorrelation mediated by a delay-line and a coincidence detector (Kostarakos & Hedwig, 2012; Kostarakos & Hedwig, 2015; Schoneich et al., 2015). As described earlier, the incoming auditory information from each auditory organ to the brain is carried by a single interneuron AN1 located in the prothoracic ganglion. In the anterior protocerebrum where AN1 terminates its axon, three spiking and one nonspiking local brain neurons, which process the information from AN1, were identified (Kostarakos & Hedwig, 2012; Schoneich et al., 2015). Intracellular recordings from these cells showed the fundamental characteristics of a coincidence circuit (Fig. 3B) (Schoneich et al., 2015). The spike activities of ascending AN1 and one of the spiking local brain neuron LN2 reliably copied the pattern of sound stimuli regardless of the pulse interval, suggesting LN2 receives a direct input from AN1. However, two other spiking local neurons LN3 and LN4 were tuned to the temporal features of the calling song patterns. For a standard calling song pattern, NL3 responded more strongly to the second pulse than the first pulse, which is one of the proposed principal properties of a coincidence detector. In LN4, the first pulse evoked an immediate hyperpolarization followed by a subsequent depolarization that rarely evoked spike activity, whereas the second pulse always evoked a stronger depolarization that reliably exceeded spike threshold. The responses of LN3 and LN4 to the sounds with varied pulse intervals were interesting. When pulse intervals were more than 35 ms, the boosted spike activity to the second pulse was abolished in LN3: All pulses evoked almost the same responses instead. Moreover, a delayed subthreshold depolarization after the initial spike-generating depolarization could be visible in response to every pulse in LN3. This delayed subthreshold depolarization was also seen when only a single pulse sound stimulus was given (Fig. 3B). A noteworthy feature of the delayed subthreshold depolarization is its strict coupling to the pulse offset with a constant latency of 40–45 ms, which is closely matched with the average response latency to the second pulse of the standard calling song. Thus, with a 20-ms pulse interval, the delayed subthreshold depolarization to the first pulse is coincidence with the spike-generating depolarization to the second pulse (20 ms of pulse duration + 20 ms of pulse interval = 40 ms of latency), inducing a more robust response to the second pulse via spatial summation of the two depolarizations. How is the delayed subthreshold depolarization generated? The answer came from the finding of a nonspiking neuron, LN5. In response to each pulse, the membrane potential of NL5 alternated between hyperpolarization and subsequent depolarization by the postinhibitory rebound (PIR), suggesting it receives an inhibitory input from LN2. PIR is an intrinsic property that elicits rhythmic neuronal activity by depolarizing the membrane once hyperpolarizing stimulus is off (Perkel & Mulloney, 1974). Because the latency of the PIR depolarization is dependent on the intrinsic property of the cell, and always coupled to the release of membrane depolarization, it is almost constant in a given cell. In NL5, the PIR depolarization was closely matched with the timing of the delayed subthreshold depolarization in LN3, suggesting NL5 formed the species-specific delay line and provided direct synaptic input to LN3. This also revealed that the response property of LN3 was well matched with that of a coincidence detector (the coincidence of a direct input from AN1 and a delayed input through LN2–LN5). Compared to LN3, LN4 was more strongly tuned to the pulse rate (Fig. 3C). For the sound stimulus with extended pulse interval, LN4 responded with an immediate hyperpolarization followed by only a subthreshold depolarization. Only sound pulses with intervals of 15–25 ms, which correspond to the standard calling song, could stimulate LN4 to elicit spikes to the second pulse, suggesting that LN4 served as a feature detector by receiving a direct excitatory input from LN3 and an inhibitory input probably from LN2. On the bases of these findings, Schoneich et al. (2015) proposed a simple but elegant neural circuit that contains a species-specific delay-line (LN2–LN5), a coincidence detector (LN3), and a feature detector (Fig. 3).

Figure 3 Pattern recognition based on a delayed line and coincidence detector. (A) Flow diagram. The asterisk indicates the first sound pulse entering the auditory pathway. The color code is the same as in B. (B) Circuitry and processing mechanism of the auditory feature detector network. Shown on the left is a schematic drawing of the circuit. Neuronal responses to a single sound pulse (middle panel) shows the delayed depolarization in LN3 coincides with PIR in LN5 (arrow). For a pair of sound pulses with a 20-ms interval, the delayed depolarization evoked by the first pulse coincidences with the second-pulse evoked spike generating depolarization, greatly boosting the response to the second pulse in LN3 and evoking a spike in LN4 (arrows in right panel). (C) Relationship between response tuning of the coincidence detector LN3, the feature detector LN4, and the phonotactic behavior. The response tuning of LN4 is well matched with the behavior.

(Copyright with permission from the American Associate of Advacement of Science. From Schöneich et al., 2015)

According to this model, the optimal pulse interval range for pattern recognition is fundamentally determined by the latency of PIR in LN5. Thus, the narrow time window of the PIR onset enables bandpass filtering of pulse interval in LN3 and LN4. It is also likely that the phonotactic selectivity in various cricket species can be established through modifications of membrane conductance that alter the latency of PIR.

To show a positive phonotaxis to a conspecific male calling song, the female should discriminate both the pattern and direction of the calling song. In a serial processing model, sound localization is achieved by comparing the output of two separate pattern recognition circuits on either side of the brain. To operate properly, the delayed-line-based pattern recognition requires at least two consecutive pulses. Therefore, if the serial processing model is true, sound localization will also require at least two pulses. However, only single pulse is enough for female crickets to show rapid (latency of only 55–60 ms) steering response toward it (Hedwig & Poulet, 2004). Moreover, when an unattractive pulse pattern was presented simultaneously with an attractive one from the opposite side, the individual pulses of both patterns could evoke steering behavior toward them, suggesting that orientation was not determined by the better pattern, but by the number of pulses presented to each side. These results indicate that auditory orientation is processed by a parallel reflex-like pathway, while pattern recognition is processed by a separate, more complicated autocorrelation line.

Conclusion

We have described our understanding of insect auditory systems, which well represent the basic features of life—the diversity and unity. Although we were unable to present every detail of diverse insect auditory systems due to space limitations, the examples from some model insects described here sufficiently show not only astonishing efficiency and innovativeness of the insect auditory systems but also remarkable similarities between diverse auditory systems, including that of vertebrates. Therefore, the advances in our understanding of insect auditory systems will be helpful for better understanding of basic features of hearing in general as well as robotic applications for simple auditory orientation.

Michelsen, A., & Rohrseitz, K. (1995). Directional sound processing and interaural sound transmission in a small and a large grasshopper. Journal of Experimental Biology, 198, 1817–1827.Find this resource: