SEVEN METAPHORS FOR (MUSIC) LISTENING: DRAMaTIC

Abstract

This essay probes the nature of listening by refusing to pin it down to a single essence. The epistemological value of metaphor is explained in terms of the cognitive metaphor and embodied mind theories of Lakoff and Johnson as well as the philosophies of Rorty and Fiumara. Then the various natures of listening are explained via seven metaphors: (1) Digestion, (2) Recording, (3) Adaptation, (4) Meditation, (5) Transport, (6) Improvisation, and (7) Computation. As a positive example, this set of metaphors promotes recognition of the inherent plurality of listening by staking out distinct facets which cannot be reduced to one another. This irreducibility is made more vivid and conceptually manageable by associating each of these facets with more concrete activities that are literally irreducible, or indeed seemingly unrelated. Moreover, some of these metaphors suggest the complementary and sometimes interdependent nature of diverse aspects of listening.

Introduction

You might have guessed that the parentheses in my title do not imply that what I have to say is less applicable to music, nor even that it is tangential to it. Rather, at a time—our time, the early 21st century—when the ontological fences dividing music from the sonorous riff-raff and rarefied speech utterances of the rest of our lives have virtually crumbled, much of what pertains to music listening now pertains to all the rest. The parentheses suggest an expansive role for discourse about all kinds of listening, musical and non-musical, in whatever sense that distinction persists.

For those of us who dedicate most of our time to theorizing, composing, analyzing, improvising, critiquing, or performing music, we are not only listening but also listening to ourselves listen, and thus are tangled in a reflexive loop of auto-meta-listening. There is always the danger of incestuous solipsism, but at best, the looming claustrophobia of such focus heightens our appreciation not only for a plurality of ways to listen, but also for flexibility in how to conceptualize the act of listening, which is the surface I will start scratching here.

Metaphor

The idea that metaphors for listening are anything but beside the point perhaps requires some explanation, all the more so because I contend metaphors for listening are exactly the main point. There is no such thing as pure listening; all listening relates to some other activity, as does every activity. Is this still doubted? Epistemologically, metaphor has been on the back foot for more than two millennia. As Fiumara puts it: ‘Metaphor frequently inhabits the margins of discourse and its potential incivility generates concern for its management. There is a subliminal anxiety which results from the difficulty of maintaining the boundary between ‘proper’ terminology in the face of metaphorical boundary-crossers…’ (Fiuamara 1995: 3). She goes on to note Thomas Hobbes’s disapproval of metaphoric expressions in Leviathan (Hobbes 1968); Hobbes complains that reasoning with metaphors is like ‘wandering amongst innumerable absurdities’ (Hobbes 1968: 116-17). The 20th century saw its own version of this as analytical philosopher Rudolph Carnap, icon of the logical positivist movement, sought to produce a scientific discourse cleansed of all metaphor so as to avoid the proliferation of Pseudoproblems in Philosophy (Carnap 1967).

Such skepticism about metaphor is directly or indirectly inspired by Plato’s famous contention that sensory experience is inherently deceptive, that all imagery is false, that only pure thought is accurate. Over such a length of time, Plato’s purity-of-thought trope has accumulated enough momentum to permeate almost every kind of discourse to such an extent it goes unnoticed—like the taste of water. (For instance in the spiritual realm, we see it in the doctrine of monotheistic religions that god should not be represented as an image.) An axiom underpinning Plato’s purity-of-thought trope is what Rorty calls mind as mirror (of nature), which Fiumara quotes Rorty to explain:

If ‘To know is to represent accurately what is outside of the mind’, to understand the nature of knowledge we must remain confined to the task of ascertaining the way in which the mind is able to construct such representation. Rorty suggests that, in fact, ‘The picture which holds traditional philosophy captive is that of the mind as a great mirror, containing various representations—some accurate, some not—and capable of being studied by pure, nonempirical methods. Without the notion of mind as mirror, the notion of knowledge as accuracy of representation would not have suggested itself (Rorty 1980: 6).

There are several reasons why this may apply—to advantage and detriment—to the act of listening, but I will address this soon enough. The point for now is that to the extent the mind is regarded as capable of mirroring the world, fidelity of representation is prioritized above all, such that negation-of-image serves as a modus operandi for achieving purity-thus-accuracy. This modus operandi obscures the epistemological value of metaphor, trying to serve as gate-keeper, standing in the way of metaphor as a legitimate mode of thought which serves the progress of knowledge.

Since metaphor involves mapping between distinct domains or systems of thought and experience, it is inherently antithetical to epistemological notions of accuracy based on fidelity, neutrality, purity, which demand demarcation and segregation of domains of knowledge as a means of control to prevent epistemological impurity. Fiumara characterizes the concern as one of containment and mastery, that ‘the very idea of transportability of words, notions and features could be a threat to the dignity of our mainstreams of philosophy, in the sense that certain ideas might not only be out of place but out of control’ (Fiumara 1995: 3).

The mind as mirror axiom is flawed (as Rorty [Rorty 1980], Fiumara [Fiumara 1995], and others now argue), because the notion of a true representation is inherently elusive. Consider the example of neurological systems of cephalopods as the basis for a thought experiment. Hanlon has identified and captured on film one the most fascinating instances of spontaneous camouflage, in this case an octopus perfectly matching the color, shape, and texture of a specific coral cluster (Hanlon 2007).

Watching this footage, if we see the entire coral cluster as just coral, is our mind mirroring reality or not? If we see part of the coral cluster as octopus, are we not then failing to see reality? An important aspect of reality is its ecological dimension, which in this case is that the octopus’s color, shape, and texture perfectly match that of its surrounding coral. If it is not seen this way, the mind is failing to mirror reality; but if it is seen this way, the mind is still failing to mirror reality. Which reality do we prioritize when we ‘listen’ to what we see?

One might counter-argue that this is a special case involving deception. Yet, at what level of consciousness are cephalopods aware of or in control of their camouflage presentation? Consciousness aside, although their camouflage is meant (in evolutionary adaptive terms) to deceive predators, in what sense could it be regarded as deceiving non-predators? If the chromataphor displays of cephalopods were presented instead as sound patterns composed to blend with surroundings, would our minds be mirroring reality by hearing the whole blended ecological context or by picking out the composed sounds from surroundings? Would it matter if these composed sounds were created by a human composer or a different creature? And would it matter if they were composed to avoid predators, attract a mate, or for ‘pure’ artistry sake? Is it possible to distinguish these categorically? And what is the octopus representing through its own neurological system? Cephalopod camouflage is not simply a reflex, because it involves a complicated dynamic analysis of the scene. Cephalopod neurological systems do mirror reality in a sense. Yet it is not a true representation (not an honest mirror), but, rather, a kind of fooling. In a sense, however, if we are not fooled by their display, we are not fully ‘hearing’ what these remarkable creatures have to say. If we are fooled by their sophisticated camouflage are we ‘hearing’ what they are saying? Or not? The notion of mind-as-mirror is foiled, or at least severely problematized, by the case of cephalopod camouflage; and the problems raised by viewing the intricate camouflage of the cephalopod pertains as well to our reception of the soundscapes around us.

The more profitable path is to regard the representational aspect of knowledge as part of a process of two-way communication, as well as one-way reception through the senses. On this view, knowledge is accumulated through lateral processes that are inflected by the embodied nature of our minds (Lakoff and Johnson 1999), rather than through a vertical purification of thought as Plato would have it. For music, this is no cavalier claim; for instance, the emphasis of negation of image as path to purity, thus accuracy, in Schoenberg’s opera Moses and Aron has been regarded as his statement on how to best understand aspects of his music (his 12-tone method). Here we see Plato’s negation of image, negation of lateral cross-reference, through the doctrine of Western monotheistic religion, projected onto our approaches to music, how to understand it, how to listen to it.

The notion that communication is ever neutral and transparent is thrown into doubt by Quine’s indeterminacy of translation doctrine, for instance (Quine 1960, 1969). Yet the notion that language neutrally and transparently transfers information persists, as is noticed by cognitive linguists. In an influential paper, Michael Reddy characterizes the folk theory of communication that English speakers typically use as based on a conduit metaphor, in which it is assumed that mental or emotional material can be physically moved from one person to another (almost as mental telepathy or clairvoyance) and that language is a medium, a conduit, through which this transfer takes place (Reddy 1993).

Reddy approximates that the conduit metaphor forms the basis for about 70% of the language used to talk about the English language. It ‘leads to a distinct viewpoint regarding communication problems’ (Reddy 1993: 167). A problem with communication, a failure of it, is framed by this distinct viewpoint. ‘One area of possible difficulty is then the [speaker’s] insertion process.’ Another is the listener’s extraction. One consequence of this is that it is easier to blame a communication failure on the speaker than on the listener, who is passive in this framework: after all, little effort or competence is required to find the contents of a package once opened. In general, Reddy argues, the conduit metaphor confuses our understanding of language by reinforcing the notion that language contains meaning and transfers it between people.[1] It skews our expectations of language.

My purpose is not only to problematize the act of listening by doubting its apparent passivity, but also to problematize the definition of listening by proposing that there is no such thing as pure listening, that probing its nature in some purist fashion is less productive than explaining its multiple facets through various apt metaphors.[2]

Rather than metaphor being considered as exclusively an artistic device, the relatively recent cognitive metaphor and embodied mind theories (Lakoff and Johnson 1980, 1999) have prompted a broader conception of metaphor, arguing that it is the cognitive mechanism by which all abstract thought (from everyday speech to mathematical reasoning) arises from physical experience. The fact that Lakoff and Nunez (Lakoff and Nunez 2000) argue that mathematics—Plato’s pet example of pure form—is inherently metaphorical, suggests just how radical an overhaul of epistemology is afoot. Cognitive metaphor and embodied mind theory entail a radical dehierarchizing of epistemology. In Plato’s view, form is the highest kind of knowledge, resulting from an ascent up and away from the physical world, a purification from the connection to and between physical experiences—in the 20th century the term structure has largely served as a rhetorical proxy for this. By contrast, according to the theories of cognitive metaphor and embodied mind theory, knowledge arises incrementally through productive cross-domain mappings between different kinds of physical experience. What this means is that knowledge, far from being a sterile purity, is rather a righteous fertile promiscuity. Listening and other forms of knowledge acquisition are multifaceted lateral processes, each devoid of an essence, devoid of any single essential nature.

What about pure reason and the pursuit of pure form? When Hobbes vehemently discourages metaphors as an impediment to truth, he echoes Plato’s fixation on attaining truth as pure form. As Plato presented it, form is a kind of negation, a purification from the defects of perceptual experience. More than ever before, the bankruptcy of this notion of form and formalism is becoming apparent, as I explained above. Form becomes empty and meaningless if divorced from context and process (which is itself a kind of context). For instance, metaphor involves mapping between and among forms and contexts, and so Fiumara warns that a retreat from metaphor risks the ‘threat of linguistic involution: a degradation which might jeopardize the development of a meaningful relation between nature and culture, world and language, deforming the relationship itself into a parasitic, destructive pattern’ (Fiumara 1995: 4). Shaviro remarks, that through Whitehead’s philosophy we see ‘It no longer makes sense to separate the theory of what we know from the theory of how we know’ (Shaviro 2009: 30). Knowledge is acquired through the medium of experience; epistemology and media theory fuse together. Contrary to Hobbes’s claim, there is virtually no reasoning without metaphors; reasoning instead occurs through what Fiumara (Fiumara 1995) calls an optimizing balance between the metaphorical and literal.

Not surprisingly, this is recognized by musicologists and music theorists who explain how metaphor is pervasive and often systematic in both formal and informal discourse about music, suggesting how it affords what we take to be musical meaning (Cook 1990, Zbikowski 2002, Spitzer 2004, Clarke 2005). This is more reasonable than expecting meaning to arise in a pure neutral way, because all perception occurs through the body, and metaphor is lateral mapping between different kinds of embodied experience; so such mapping always has an opportunity to occur and cannot help but occur. ‘The body is the pre-requisite for perception, …it impurifies perception, exactly because it enframes perception’ (Meelberg 2008: 65).

The Plurality of Listening

The idea that there are different ways to listen and indeed different kinds of listening has been discussed in various ways by music theorists. Huron describes a listening mode as ‘a distinctive attitude or approach that can be brought to bear on a listening experience’ and suggests a non-exhaustive list of 21 listening styles and strategies for music: distracted listening, tangential listening, metaphysical listening, signal listening, sing-along listening, and so forth (Huron 2002).

Morris (Morris 2002) identifies three ‘levels of attention’: (1) Ignoring music that is sounding, which is what happens at social functions for instance; (2) Intermittent attention, where the listener goes off on tangents that are suggested by the music; Morris makes an interesting point that musical experts see intermittence as problematic and blame it on the listener’s lack of appropriate knowledge or ability to use it, or attribute it to deficiencies of the composer or performer; (3) Complete, undivided attention, where one pays constant attention, never losing contact with the music; musicians are better at this, and following a score helps.

The flexibility and the conceptual, active, or agential dimensions of listening are brought even more to the fore in writings of Lewin and Spitzer. (Both are inspired by Wittgenstein’s duck-rabbit, or ‘dubbit’, illusion.[3]) In his influential essay on ‘Music Theory, Phenomenology, and Modes of Perception,’ Lewin discusses at length how a heard event is not ontologically defined in a unique way, because it admits multiple perceptions, which vary according to the temporal context, that is according to what has been heard subsequently, not to mention the influence of whatever theoretical apparatus is adopted as a lens through which to hear (Lewin 1986). Spitzer’s approach emphasizes the metaphorical nature of listening even more, suggesting that a hearing is a ‘hearing as’[4] (Spitzer 2004). This implies that there is not a privileged pure mode of hearing, but rather always a lateral referential mapping between sound and thought, between physical stimulus and its reception in the mind.

All of these inquiries are in the vein that my mine wishes to flow in. For the purpose of making these pluralities more vivid, my device is to tentatively deny that listening is even defined as an act or action. I pursue this denial productively by viewing listening only through the lens of other acts and actions, each of which, through its particular affinities, brings different dimensions of listening to the fore. My strategy—however contrived it may seem—has the added benefit of suggesting parallels to other fields of inquiry, to which and from which productive discourse may flow.

Listening as Recording

Listening as recording is probably the most common and perhaps also the most troublesome way listening is conceptualized. It assumes that listening is a neutral and virtually effort-free activity. More than any other, this way of conceptualizing listening stresses fidelity. It assumes what Reddy (Reddy 1993) calls the conduit metaphor, the view that communication is a simple lossless transfer of information, as if moving a physical object from one location to another. Although there are situations in which listening may result in recall that meets the level of precision expected or demanded by the source, this is by no means usually the case, and in some ways may be beside the point, for instance if the source intends to deceive or if the source is not even a conscious agent.

Transcription and dictation tasks are the paradigms for listening as recording. Upon inspection, however, it becomes clear that every transcription-dictation task grounds itself in some or other set of ontological commitments, as is well known to ethnomusicologists (Nettl 2005: 74-91). Conventional melodic dictation assumes a straightforward categorization of pitch events into notes. Conventional harmonic dictation entails a categorization of sounds into chord labels; chord labels entail some or other theory of chords; not all such theories are interchangeable with each other. On the basis of my experience teaching ear training, I can testify to the fact that one chord ontology versus another can significantly influence what one hears, including what might be called the accuracy of such hearing. The fact that dictation is often used for learning foreign languages testifies to the fact that listening is not simply recording. For if it was, then it would have no pedagogical value for learning a foreign language.

One of the problems with viewing listening as recording (in the sense of transcription) is that it must very often be admitted as a failed effort, a failure.

This we witness when psychologists or skeptical musicians question whether serial or dodecaphonic ‘structures’ are ‘heard’ —often designing and carrying out listening experiments to corroborate their scepticism. This is induced by their false impression that the calculation that goes into composing the music demands from the listener a detailed recognition of every sound so calculated in relation to the way it is calculated. Yet the same is not expected of a listener hearing simpler music. The fallacy of this impression is explained aptly by Scotto (Scotto 2004). This brings up the fact that differences of opinion on musical aesthetics are often tightly bound to the question of what ‘listening’ is, and how that might be, or should be, influenced or determined by what the music is supposed to do. Is it mere entertainment? refined decoration? communicative expression? emotional engineering? Or something else?

Returning to the issue of listener recall and expectations, there is the importance of hearing information that is ostensibly peripheral to the message transmitted as well as the case in which there is no message. When listening to tone of voice to determine the attitude of the speaker, one’s transcriptional accuracy is often rightly subordinated to assessing the sonic landscape holistically—an aspect of adaptation to be discussed further in the next section. In cases in which no message is transmitted, the ability to either reproduce a heard sound to a sufficient level of precision might be taken as a gold standard for listening, as when comedians do impressions of other celebrities. The non-verbal case is even more interesting, and may involve sonic reproduction as well as diagnostic assessment based on such reproduction. Both of these are depicted, for instance, in an amusing series of AAMCO TV advertisements and in which customers emphatically convey the sounds of their ailing cars, while an attentive auto-mechanic miraculously diagnoses an engine problem by ear.

What’s amusing in this case is that the customers have mentally ‘recorded’ the sounds of their cars’ defects, to such an extent that they can faithfully reproduce the sounds; it’s striking for being so unusual. And the auto mechanic’s diagnosis suggests a sonic version of a game of charades.

Listening as recording is probably the most prevalent way listening is conceptualized. Because sonic environments (which exhibit familiar and unfamiliar, natural and artificial, intended, unintended, and indifferent sounds) are potentially so rich and diverse, listening as recording sets a high standard. This is both good and bad. On the positive side, it stresses our capacity to enlist all our attention to the act of listening, with sometimes astonishing results. The downside is that it buries other significant facets of listening. For instance, when listening is regarded as recording, the medium on or through which the ‘recording’ takes place imposes a strong bias on what is heard. The discrete symbolic nature of transcription or dictation washes away all that falls outside the ontology of symbols, not to mention holistic facets or nuanced facets that defy transcription altogether. So we find that our concept of listening is severely flawed if it fails to account for the attitude or disposition toward listening that is adopted in the first place.

Listening as Adaptation

Listening as adaptation is one of the more important recent developments in theories of listening, involving an enlightened ecological perspective. John Cage and later sound artists such as Max Neuhaus, who were inspired by earlier sonic pioneers such as Varese and Russolo, drew welcomed attention to the virtues of hearing environmental sounds aesthetically. For instance Neuhaus’s 1976 Listen poster conveys his desire to ‘give credence to…live street sounds’ by ‘taking the audience outside.’

The adaptive aspect of listening has become a significantly focused area of study since the 1970s. This is the ecological approach to the study of listening, which is covered comprehensively by Eric Clarke, who pays homage to earlier writings by James Gibson Stephen Handel, and Albert Bregman, as well the comprehensive body of work by W. Luke Windsor (Clarke 2005, Gibson 1966, 1979, Handel 1989, Bregman 1990, Windsor 1994, 1995, 1996a, 1996b, 1997, 2000, 2004). Clarke notes that ‘musical materials have the capacity to place a perceiver in a certain relationship with music — ironic, humorous, accepting, critical, alienated’ (Clarke 2005: abstract for chapter 4, pdf version). He also reviews the listening style typologies of Theodor Adorno and Pierre Schaeffer.

Not covered by Clarke is Oliveira and Oliveira’s ecological approach, which I find particularly helpful in its directness; it considers perception as it emerges from the interaction of the perceiver and its environment (Oliveira and Oliveira 2003). Organisms have no conscious control over the purely local parts of the perceptual system, such as the stapedius muscle of the ear. Yet many organisms do have the capacity to self-organize or ‘tune’ their focus in order to better interact with their surroundings and detect information, for instance, as it is important for survival. ‘Self-tuning’ is Oliveira and Oliveira’s term for this special ability displayed by the perceptual system as perceivers interact with their environment. ‘Self-tuning is a self-organizing process to better fit the perceptual system with adequate information’ (Oliveira and Oliveira 2003: 47). Some such self-tuning takes place in immediate response to stimuli, though more importantly some such self-tuning occurs gradually over a lifetime.

In ecological psychology, bits of information gathered through such self-tuning are deemed emergent properties and are called affordances (Gibson 1977), in that they afford the organism actions specific to their own wellbeing—actions superfluous to, and therefore information superfluous to, other organisms. Affordances are emergent in that they exist neither in the physical world nor in the organism, but rather emerge from the interaction of the organism with its physical or cultural environment, called a niche.

More recently, Thibaud discusses similar issues in terms of ‘tuning into ambiance,’ drawing attention to how the ecological perspective is more particular to sonic then visual experience: ‘[A]n ambiance can be specified by its ‘tone’ (an affective tonality), it involves our ability to be ‘in tune’ with the place, it has something to do with ‘sympathy’ and ‘harmony’. We speak sometimes of ‘a vibrant atmosphere’. While our everyday language is predominantly based on visual images, such is not the case with ambiance’ (Thibaud 2011).

Despite how it might seem, listening as adaptation pertains not just to survival-motivated information gathering, but also to aesthetic pleasure. For music, a good bit of attaining the proper mindset for aesthetic appreciation derives from a synthesis of conceptual and perceptual learning, as I have previously (Mailman 1996, 2010a) explained as being one of the aims of music theory and analysis and which has been addressed more recently by Croom (Croom 2011). The process of getting into the apt listening mindset (as an aspect of music theory and analysis) is explained by Peles as using causal inference to systematically establish ‘initial conditions’ from which a reader-listener can appreciate a particular musical effect (Peles 2007):

Knowing something about the causal history of the effect enables us [theorist, analyst, teacher, writer] to induce it in those who wouldn’t otherwise have the experience; it allows us, in short, to change the initial conditions. We start at the earliest point in the causal history to which the subject responds, and move incrementally up the chain from there, effectively reading the causal history forward toward the effect, rather than backward from it as we did when we were contracting and explanation. In this respect music theory has two tasks, one explanatory and the other didactic (Peles 2007: 74).

That aesthetic appreciation of music varies so much is partly attributed to the fact that most people are habituated to certain musics to such an extent that their focus becomes not only conditioned but even entrenched, whereas the focus of others remains flexible; their ears are open to learn previously unheard nuanced sonic effects.

Though not exclusive to music listening, the ecological approach has drawn attention to what is essentially a bifurcation of listening orientations such as those just mentioned. These are, as Truax puts it, listening-in-search vs. listening-in-readiness (Truax 2001). In the first case, one has decided in advance specifically what to listen for, whereas in the second case one is attending holistically to the entire soundscape. Though not necessarily the same, these also parallel the distinction between attentive listening (focal attention), and diffused listening (global attention). Recently, industrial sound theorist Julian Treasure calls these listening positions, distinguishing them as reductive listening (listening for) and expansive listening (listening with) (Treasure 2007). Provocatively, Treasure furthermore notes a gender bias in regard to the listening positions: males often tend toward reductive listening (listening for) while females tend toward expansive listening (listening with).

In her book on the philosophy of listening, Fiumara distinguishes logos from legein, noting how the logico-metaphysical tradition has emphasized logos to the detriment of legein. Logos is grasping, mastering, using, whereas legein is letting lie together before using (Fiumara 1990: 15). In this respect legein is more like listening as recording, or in any event is less invasive than logos. As Fiumara explains it, logos has tended to dominate our concept of listening, making it too proactive, to the detriment of listening’s ecological potential, which is embodied in the concept of legein. Rather than enabling manipulation, as logos emphasizes, legein by contrast means keeping and preserving. ‘Keeping represents the essential quality of authentic listening and remembering. The kind of hearing that preserves may well be deserving of philosophical priority.’ Legein ‘aims at coexistence with rather than knowledge of’ (Fiumara 1990: 15). Otherwise we may, out of convenience, reject whatever is difficult to master. That is, listening-as-prompt-for-action (logos), is highly selective listening. Whereas listening as legein—which is reminiscent of the exiled book-lovers who memorize and therefore memorialize entire books in Ray Bradbury’s Fahrenheit 451—is a more authentic kind of interpersonal engagement. The adaptive ecological aspect of Fiumara’s legein oriented view of listening is strong.

To some degree, a listener adopting the most appropriate listening angle, channel, vessel, depends on how she may be situated within speech communities (Bahktin 1986, Cumming 2000: 17), for instance as exemplified by Mead’s chapter title ‘“One Man’s Signal is Another Man’s Noise”: Personal Encounters with Post-Tonal Music’ (Mead 2004). There’s also the didactic or pedagogical element alluded to above in regard to aesthetic appreciation: how does a listener gain entrance to various musical ‘speech communities’? How is authentic legein to be promoted and developed?

Although it is not the approach Fiumara takes, I find that one way of promoting legein is actually through logos, deployed pluralistically, flexibly, sensitively. In listening to music, ‘tuning in’, getting to the right ‘initial conditions’, is often a matter of knowing what kinds of changes to focus on (as opposed to what remains invariant). For instance, in music that is constantly dissonant, the ebb and flow of consonance and dissonance is not an appropriate feature to tune in to. Cogan suggests the importance of choosing features carefully in music analysis based on modeling flux of intensity: ‘It is important...to understand which sonic features bear that potential charge of change in each piece’s specific context’ (Cogan 1984: 152).

Listening in readiness requires flexibility, but it often also requires specifically a flexibility of focus, which then becomes a specific listening-for. Thus logos can be harnessed for the purpose of legein, as one from among a vocabulary of listening focuses is called upon to optimize a particular listening situation. As I (Mailman 2010b) show in the first movement of Ruth Crawford Seeger’s Quartet 1931, the features that give rise to medium-range form may be different in different sections of a piece and may even be features that do not bear form in other pieces of music. After many years of expansive listening-in-readiness to (listening with) this piece and contemplating it, I developed a particular kind of defused global attention that I could then articulate and deploy in terms of logos, so a productive legein can be communicated precisely to other listeners less familiar with the work. Listening in readiness can be promoted through a pluralistic vocabulary of specific ways of listening for. This is adaptive listening developed communally and promoted through logos. Specifically, the flexibility that legein demands may be promoted by the intersubjective communicability of logos.

The flexibility thesis proposed has two facets: On the one hand, as explained by the information theorist Abraham Moles, the transmission of form is achieved through ‘channels’, in which a message may be expressed as a temporal form, a contour of flux (Moles 1966). In other words, the reception of form and expression depends partly on the nature, the disposition, the configuration, of that through which it is transmitted. Previously (Mailman 2010a, 2010b, 2011) I generalized this to the concept of a vessel of form, which encompasses two reciprocal space-time metaphors for the time-lapsing of musical works. A vessel is any specified disposition or mechanism for information gathering that is apt for detecting appropriate flux of intensity of whatever quality is relevant to the situation. The concept of vessel encapsulates many of the adaptive aspects of listening just discussed. The relevance to expression is that form can itself serve as a vehicle for expression. That is, forms can serve as signals, which can act as vectors of transmission for feeling, as Whitehead puts it (Whitehead 1978). Or as I explain in my analyses (Mailman 2010a) of Medieval secular songs, a form can serve as a vessel of meaning. A vessel is a logos-fueled tool that works in the service of legein. Knowing there are various—in fact infinite—vessels promotes flexibility of listening which underlies authentic legein as described by Fiumara.

Listening as adaptation begs at least as many questions as it answers. Does a listener choose or is a listener chosen by a particular speech community? How does one develop, articulate, self-tune to a specific vessel? What time-scales are involved? What role does memory play? What is the appropriate balance of reductive listening (listening-in-search) and expansive listening (listening-in-readiness)? Is the adaptive dimension of listening a matter of compulsion, habit, or decision?

Listening as Improvisation

Thinking of listening as improvisation promotes not only the spontaneity of listening—which may be an openness in the sense of legein—but also the agency of the listener, including all the ethical ramifications of that. Listening as improvisation is not just the unscripted aspect of listening, but moreover its anti-scripted potential, determinable by the context as well as by the listener’s volition.

Consider that Reich describes process music (which is pre-determined) as the opposite of improvisation (Reich 1968). Transferring this to the listening act itself: if one has determined what to listen for (listening-in-search) or how to listen in advance, this is the opposite of improvisational listening.

George Lewis explains improvisation as more than just a set of musical practices but rather as an ethically charged ‘real world’ mode of behavior, quoting philosopher Gilbert Ryle on this point (Lewis 2007). Ryle remarks: if someone

[i]s not at once improvising and improvising warily, he is not engaging his somewhat trained wits in some momentarily live issue, but perhaps acting from sheer unthinking habit. So thinking, I now declare quite generally, is, at the least, the engaging of partly trained wits in a partly fresh situation. It is the pitting of an acquired competence or skill against an unprogrammed opportunity, obstacle or hazard (Ryle 1976: 77).

From this point of view, all life actions exist on a continuum between ritual or automation (on the one hand) and improvisation (on the other). Lewis also remarks that ‘[f]rom a musical improviser’s standpoint, composing, performing, and listening…come together in the practice of improvisation.’

In regard to legein vs. logos, listening as improvisation cuts two ways. On the one hand, improvisation entails a responsiveness to the unpredictable flux of a situation, being alive to the situation one encounters. On the other hand, it also entails the prerogative to self-determine one’s thoughts and action. As Lewis (Lewis 2007) puts it: ‘improvisative production of meaning and knowledge provides models for new forms of social mobilization that foreground agency, personality and difference.’ Improvisatory listening therefore suggests alertness to a situation as it unfolds, but it also implies that the listener has the ability to exercise choice in how to focus. Aspects of the idea of self-determination in listening are traced in Steege’s doctoral dissertation on Helmholtz, who bore a ‘reformist impulse to submit listening to a rigorous and virtuoso discipline of attentiveness to the radical particularity of sensation’ (Steege 2007). The important thing about listening as improvisation is that it is the alternative to a scripted listening approach, in which—as one might when listening to a fugue—one follows a conventional prescription for attending to events known in advance; instead, listening can be just as spontaneous as performing.

The legein and logos aspects of spontaneous listening do not necessarily conflict, as even Gibson’s (Gibson 1979) ecological theory of perception identifies affordances as ‘action possibilities’: actionable options revealed through sensitivity to context. In fact Lewis’s 1980s Voyager system (and the work of the same name) illustrates a computationally implemented effort to balance self-determination with contextual responsiveness in an unscripted situation.

Voyager is interactive software Lewis programmed to ‘listen’ to a live human improviser, choosing to ignore or respond to the human’s musical gestures by playing its own musical gestures or remaining silent. When it does respond, Voyager does so by spontaneously deciding what aspects of the human performer’s gestures to focus on, and whether to imitate or oppose them. Voyager therefore embodies agential aspects of listening.

Listening as Computing

Our routine ontology of sound attests to the fact that listening is partly computational.

That we ‘hear’ vibrations as higher or lower pitch is the result of automatic mental calculations. (Otherwise we would hear them as different speeds of pulse.) Parsing a stream of spoken sound into words, segregating a sonic landscape into multiple streams of information—these are information processing actions, involving networks of intricate systematic procedures.

The idea that listening is computation goes hand in hand with the fact that listening is cognitive and that cognition—even its involuntary perceptual aspects—can be modeled computationally. The human perceptual-cognitive system is capable of measuring or estimating quantities pertaining to its input, the stimuli of its environment, as the cognitive psychologist Lisa Feigenson explains. ‘Adults can represent approximate numbers of items independently of language. This approximate number system can discriminate and compare entities as varied as dots, sounds, or actions’ (Feigenson 2008: abstract). Consider music analysis in this context. If an interpretive analyst of music wants to tune his or her audience’s focus to a particular quantitative or ‘approximate number’ aspect of a musical work, Feigenson’s research suggests this might be done in part ‘independently of language,’ since evidently such quantitative cognition is partly non-verbal. Not only is the human perceptual-cognitive system capable of measuring or estimating quantities, it actually does so naturally, involuntarily—as when we estimate distance to catch a ball moving through the air, judge the temperature of our food for eating, water for swimming, weather for dressing, when we distinguish shades or hues of color in our visual field, make decisions by weighing risk, and when we determine the interval or relative duration between two pitches in music. All of these acts, though fallible and approximate, are essentially mental computations—and to conceptualize them as such is the basis for computational modeling of cognition (For neuroscience theories of the mind as computer or information processing system see for instance Churchland and Sejnowski’s Computational Brain [Churchland and Sejnowski 1992]).

When defined with appropriate breadth, computational modeling of music listener cognition can be traced back to the mythical discoveries of Pythagoras in ancient Greek civilization, more recently to the 16th century writings Zarlino and Descartes, and later to Rameau and Weber in the 18th and 19th century. Zarlino and Descartes suggest intersubjective bases of consonance perception based on collections of intervals defined as mathematical ratios. As Moreno explains: Descartes examines the criteria for the construction of sound ‘in terms of measurement (calculation of identities and differences among intervals or temporal units in a composition against a common unit) and order (serially arranging the results of measurement according to degrees of complexity). The results of such analysis constitute the mark for certain, demonstrable, and intersubjective knowledge’ (Moreno 2004: 14). The computational modeling is pertinent to listening because intersubjective aspects of music are not always physical. Some musical entities occur in the listener’s consciousness but do not exist acoustically in the physical environment (Such musical entities are mostly emergent qualities [Mailman [2010a]).

In the 18th century, Rameau’s concept of fundamental bass (comparable to what we now call chord root) again implies that some mental calculation is involved in perception of harmonic progression; his notion of implied dissonance situates them in the mind, rather than in mere acoustical vibrations. Thinking of Rameau’s ‘implied dissonances… if the acoustical datum in a composition and/or in its performance is in a way deemed insufficient for our adequate comprehension of it, then the very ontology of sound within the theoretical category ‘harmony’ (i.e., the conditions that determine its being) is open to question, and the epistemological stakes placed on listening rise… An adequate understanding of [the] musical imaginary [what implied dissonances possibly are] depends upon an explanation of the function of the subject of human agent behind the implied notes’ (Moreno 2004: 15-16).

In this tradition, the most self-conscious instance of such subjective computational modeling prior to the 20th century was Weber’s popularization of roman numeral analysis as a way to characterize the moment-to-moment perception of tonal key center and chord changes in relation to it. With chords aurally characterized by a number corresponding to the intervallic relation between its chordal root and the tonic pitch, we see a phenomenological relation of the listener-subject (noesis, or what Weber called ‘das Gehör’) and perceived object (noema) defined by a numerical calculation. In classical repertoire—such as the opening of Mozart’s ‘Dissonance’ Quartet which Weber analyzes—a chord is not simply any notes sounding at the same time, but rather an entity with a subtler status. In a sense the chord is ontologized through the process of it being perceived, since the listener might be aware that the chord’s status as a chord arises from it being perceived as such. ‘A modern self-questioning subjectivity [is] manifest in the way [Weber’s] ‘das Gehör’ interprets what it hears, knows itself to do so, but doubts whether its experience can be represented’ (Moreno 2004: 19). It might be said: the noesis observes, contemplates, elaborates, and thereby creates its own noema—aware of the partially derived nature of the noema.

Such computational modelling of cognition (whether 16th, 18th, or 19th century style or the more modern making of computing machines that emulate mental processes) allows the interpretive analyst to sharpen her own observations and orient her listener-audience closer to her appreciative perspective. A computing machine can thus serve as an instrument of mediation, through which the interpretive analyst can communicate his or her ‘first hand’ experience of flux. The phrase ‘instrument of mediation’ is borrowed from Morgan and Morrison, who stress that for a model to serve as a mediator, it must operate autonomously (Morgan and Morrison 1999). I (Mailman 2010a) chose the metaphor of the computing machine because typically machines are thought of as operating autonomously, after initial activation.

I explain that such mediation is thus an experimental mode of phenomenology involving the concept of a computational machine working in a feedback system of informative communication, therefore called cybernetic phenomenology. Since the 1980s, some music theorists have sought to represent their own listening experiences through the development of computational models, for instance Lewin’s unrolling vector model of rhythmic patterning (Lewin 1981), Roeder’s calculus of accent model (Roeder 1995), and Quinn’s ‘fuzzy’ model of melodic contour similarity (Quinn 1997). Related to this, but not phenomenologically driven, is Rowe’s ‘machine listening’ software system Cypher, which computationally models how basic musical elements (pitch, duration, loudness) and conventional constructs (density, meter, tonal chords, keys, and phrase boundaries) are heard[5] (Rowe 1993).

Machines can serve as both simulations and extensions of the mind, in many contexts. It is primarily through its simulating capability that a machine extends the faculties of the mind. Typewriters, telephones, telescopes, microscopes, cameras, video recorders, air-brushes, blow-torches, satellites, MRI, the internet, as well as pianos, organs, and clarinets are all machines designed and built or used as tools for discovery or expression, or both. They are, in this way, extensions of the mind, and it is in this spirit that a computer (computing machine, software program, computational simulation) can be understood as a cyber-being, an extension of the mind that enables or enhances communication, in a fashion that is both similar to, and slightly different from, how music theories have traditionally addressed listening (which is explained in reference to Peles [Peles 2007] above).

Here is how the proposed cybernetic phenomenology advances aesthetically adaptive (critical-aesthetic) and epistemological goals. For aesthetic adaptation, what is needed is for the interpreter analyst to ‘point at’ and ‘point out’, directing the reader-listener’s attention appropriately so that the flux can be experienced first hand. Temporal dynamic form theory (Mailman 2010a) proposes: this can be done by using the concept of a computing machine and its output to represent the interpreter’s cognition and thereby serve—like a demonstrative word (such as ‘this’, ‘that’, ‘these’, or ‘those’)—to ‘point at’ and ‘point out’ features in music.

Two presentations may be provided by the interpreter-analyst: (1) explicit specification of the computing machine’s operation (how it processes its input) and (2) the computing machine’s output (which may be in the form of a series of numbers realized as a contour graph) as produced from the specific musical work as input. Awareness of the computing machine’s operations guides the reader-listener to tune his or her focus similarly to that of the interpreter-analyst; the computing machine’s output serves as a guide for the reader-listener to fine-tune or sharpen the focus of his or her cognition. These two presentations together prompt the reader-listener to mentally construct a procedure that simulates what the interpreter-analyst perceives as the mental processes that give rise to his or her own experience of expressive flux in the music. Thus linking the audible to the expressible, the computing machine and its output represent the interpreter’s cognition in an intersubjective space. By providing this information in an intersubjective space, the interpreter-analyst can give the reader-listener a tangible insight into how he or she hears flux, from which form and expression in the music can be experienced.

Figure 1 illustrates how this cybernetic phenomenology works.[6] (The diagram is read from right to left as indicated by the direction of its arrows.)

Figure 1: Diagram of cybernetic phenomenology

The top depicts the analyst-theorist[7] (noesis) listening to musical work W (noema), cognizing it, experiencing some sense of form-bearing flux F, which contributes to aesthetic appreciation A. (This simple model does not address anything like the complexity of all that goes on while listening to, cognizing, and appreciating music; rather it only addresses the aspect of aesthetic appreciation that arises from the experience of sensing some form-bearing flux during the act of listening to music.)

As depicted by the thick arrow running from the top to the middle, to sharpen his or her own mental processes, the theorist-analyst develops—or chooses appropriately from those already developed—a machine whose computations model the cognition that leads to experience of flux: F. The machine may be physical or metaphorical, an actual computer software program or merely a defined series of rules, an algorithm, or equation that produces an array of numbers F’ as output from an encoded or partially encoded version of the musical work W’s score as input. To the extent the theorist-analyst is self-regulating and successful in this activity, the machine’s output array F’ approximates the analyst-theorist’s experience of flux F, thus: F’ ≈ F. A beneficial side effect is that the output, array F’ (taken in the context of the algorithm that produced it) constitutes robust knowledge of the repertoire (work W in this case) that served as input. (In this way it contributes to the feedback processes of knowledge development.) Since this data relates, albeit indirectly, to the connection between a musical score and the thought processes of a listener focusing on form, it may be relevant to the development of new compositional strategies for projecting form. As depicted by the thick arrow from the bottom to the middle, because the analyst-theorist has specified the operations of the machine he or she has designed or chosen, a reader-listener can learn how the machine performs these computations.

As depicted at the bottom, after having learned of the machine computations, the reader-listener (his or her ‘directive-self’) can try to roughly emulate or approximate the learned machine computations and attend to the resulting experience of flux; for this, the reader-listener may require use of the score and a contour graph of the computing machine’s output. To the extent the reader-listener is successful in this adaptive listening activity, his or her experience of flux F” approximates the machine’s output array F’ (thus F” ≈ F’) which in turn approximates the theorist-analyst’s experience of flux F, thus: F” ≈ F’ ≈ F. The reader-listener’s experience of flux F” contributes to some aesthetic appreciation A”. Insofar as the experience of flux F contributes to the analyst-theorist’s aesthetic appreciation A in the first place, and insofar as this connection is intersubjectively realizable, the reader-listener’s aesthetic appreciation A” then approximates the analyst-theorist’s aesthetic appreciation A, thus: A” ≈A. Such a process—that which results in A” ≈A— is none other than an instance of aesthetic adaptation: being guided on how to tune in appropriately to maximize aesthetic appreciation, which is what the philosopher of aesthetics Arnold Isenberg calls critical communication (Isenberg 1959).

Granted, such aesthetically directed adaptation (critical communication) is taking place indirectly through a network of asserted representations. Sometimes, however, this may be the optimal way for listening adaptation to be enhanced by discourse. As Reddy persuasively argues, communication is not as simple as just the transfer of information from one mind to another; communication is not an effort-free system, neither on the transmitting, nor on the receiving end (Reddy 1993). The usually and tacitly assumed conduit metaphor for linguistic communication is inadequate.[8] The more accurate assessment, Reddy explains, is rather that language helps one person construct ‘out of his own mental stuff something like a replica, or copy, of someone else’s thoughts’ (Reddy 1993: 167). This is the toolmakers paradigm. ‘In talking to each other, we are more like people isolated in slightly different environments’ (Reddy 1993: 170). Participants (communicators) have different ‘repertoires’. Each person is as if permanently confined to a separate sector on a wheel. They cannot visit each other or exchange physical objects. Machinery connects each sector to every other sector. The machinery, when mastered, allows the inhabitants of the sectors to ‘exchange crude sets of instructions [blueprints] with one another—instructions for making things helpful in surviving, such as tools, perhaps, or shelters, or foods, and the like…The people only know of one another’s existence indirectly, by a cumulative series of inferences’ (Reddy 1993: 172). Reddy calls the mutual isolation ‘radical subjectivity’. He tells a story involving four people using the toolmakers paradigm. Through several iterations of exchanging and executing various sets of instructions about rakes and other related tools, they learn more and more not only about making tools but also about each other, each other’s environment, and each other’s past sets of instructions. Precisely specified representations of thought serve as tools and machines in communication, including communication about listening, such as how to listen-in-readiness, beyond one’s usual habits.

As Moreno puts it, in his explication of the music theories of Zarlino, Descartes, Rameau, and Weber: ‘Representation encompasses the link between the expressible and the audible, although...what “the audible” may be is itself constructed in representation.’ The intersubjective space of music is built on such learned conceptual representations. ‘Listening...may entail deciphering according to learned code signs intercepted in hearing, or it may point to the source of sound on the basis of which the listener develops an “intersubjective space”’ (Moreno 2004: 6-7). Thus the noema (the experienced) is partly molded by noeses (one or more experiencers). By definition the ‘intersubjective space’ is available to more than the noesis (experiencer). So it may be drawn upon as well as contributed to by any number of noeses. In this way noeses individually or jointly influence their own noema through the representations they develop.

The role of computation in representational tools may be less obvious. It is natural that many such tools are quantitative because not all aspects of critical communication can be achieved verbally. Feigenson’s (Feigenson 2008: abstract) cognitive psychological research finds: ‘Adults can represent approximate numbers of items independently of language. This approximate number system can discriminate and compare entities as varied as dots, sounds, or actions.’[9] Some of these mental representations might be those ‘whereof we cannot speak’ —except awkwardly. In some contexts, such as those to which Feigenson refers, precise or estimated measurements might communicate even when words fail. As Whitehead (Whitehead 1978) observes: ‘plotting changes on a common scale helps surmount their privacy.’ As remarked above, the computing machine and its output represent the interpreter’s cognition in an intersubjective space, linking the audible to the expressible. Spitzer explains hearing as as ‘a technical procedure that can be prompted’ (Spitzer 2004: 9).[10] A computational model is, among other things, a non-verbal (or partly verbal) mode of prompting. Thinking of listening as computing helps us negotiate communication about aspects of listening that are non-verbal; computing enables certain flexibilities of representation that are inaccessible through words alone.

Based on Whitehead’s pithy epithet, Reddy toolmaker’s paradigm theory of communication, Feigenson’s empirical findings, and the ongoing project of cybernetic phenomenology, it should be clear that logos can work in the service of achieving legein, because it enables communal discourse to be bolstered by technical ingenuity. The broader goal of adaptation is thus served by the powerful flexibility of computational representations: legein through logos; listening in readiness enhanced by multiple ways of listening for.

Listening as Digestion

That listening is a kind of digestion derives from its inherent temporality, from the fact it is in a sense tactile—being a generalized form of touch involving the whole body, as Evelyn Glennie (Glennie 1998) calls it[11]—as well as its direct parallels with the processes of literal digestion, such as its filtering and nourishing functions.

Consider the inherent temporality of listening. Sound is ephemeral but its memory is not. Can listening be separated from its memory? Can it be equated with it? The elements of a visual scene may, prior to interpretation, be regarded in a paratactic sense (content without regard to order); one’s memory of a visual scene need not incorporate any sequential ordering information. By contrast, that which we listen to, sound and its content, is presented sequentially; only through interpretation can they be regarded paratactically. It is ephemeral, yet its qualities linger. What we listen to can only be ontologized (recognized as quality, entity, or process) through our memory of it. Listening is in a sense inseparable from its flow.

Yet also, during the time we are listening to some quality, entity, or process, there is a present, which we subsequently regard as the past moment in which it occurred. This is, as William James puts it, the specious present, the short duration of which we are immediately and incessantly sensible (James 1890). Ushenko calls it the protensive present, noting that in aesthetic experience, it expands to take up more natural time (Ushenko 1953: 120-63). This makes a given amount of natural time seem shorter when experienced aesthetically—perhaps because one’s absorption in the total aesthetic experience (as one, or few, longer specious presents) dwarfs the sense that time is passing. This does not necessarily imply that ‘time flies’ in the usual sense. As suggested in Thomas Mann’s novel Magic Mountain, a duration full with events can in retrospect seem longer than a relatively uneventful duration of equal length.[12] The point is that ‘the practically cognized present is no knife-edge, but a saddleback...’ as James puts it (James 1890: 609-10). This counters the naïve view of the ‘now’ as a mere point separating past from future. The ‘now’ is a duration of indefinite length. It is in the moving ‘now’ that listening occurs.

Beyond this, however, what we listen to stays with us, nourishing our consciousness. As we ingest food or drink, we are left with the taste for only an instant, and ingested material passes through us, though the memory and nutrients from them may linger or accumulate for an indefinitely longer time. That is, there are two ways in which what we listen to exceeds the ‘now’ in which we hear it. The first is in that it lingers, being gradually metabolized, fading in intensity and bulk as it recedes into the past. (This is accounted for in some respects by Husserl’s phenomenology of time and duration [Husserl 1964].) The second is that it may accumulate in our memories for an indefinite time.

All of the following characterize that which we listen to: (1) it is ephemeral in that it flows through in such a way that it is at some point ‘gone’, accessible only through the memory of it that is left behind; (2) at some point, some of it is in, on, or of our perception, taking up the ‘present’ of our consciousness as we attend to it; (3) aspects of it are left with us (in us), fading in intensity, as the moment of their physical vibration recedes into the past; (4) aspects of it are left with us (in us) indefinitely, accumulated, even long after its sound ceases as physical vibration; (5) whereas in some respects we ‘hear’ all sound that occurs as a continuous flow, in other respects we filter out sound that is beyond our hearing range or comprehension; we may group sounds into useful units (or sonic strokes, see Meelberg 2009) such that incompatible information is filtered out. The framing function of perception ‘…is a subjective act [in which] the body takes relevant precepts from the unfiltered flux of perception’ (Meelberg 2008: 64). In one way or another all of these five characterizations of listening’s temporality also characterize the temporality of biological metabolic systems, that is: digestion.[13]

Listening, like digestion, may be characterized by various flow systems, which account for the variety of ways we experience its temporality. Previously (Mailman 2010a) I have written about the role flow systems play in various vessels of dynamic form and expression. These suggest the digestive nature of listening. Often the perception of form and expression arises from the flux of qualities emerging somehow from all events within each span of time, that is, statistically from the totality of the span’s events. In other situations, however, only certain segments or elements (called occurrences) of sound contribute to form and expression—for instance insofar as form and expression arise from the status of a musical motive, or from imitation, or even from spoken words.

So, primarily there are two kinds of flow through a vessel: unfiltered and filtered, diagrammed in Figure 2.

Figure 2: Diagram of unfiltered and filtered listening flow systems

Unfiltered flow means the entirety of each span of sound is evaluated (Figure 2a). By contrast, filtered flow means only certain segments or elements (occurrences) within the flow are selected for evaluation (Figure 2b); the modeling of musical form and expression through filtered flow is necessarily selective; it involves the detection and selection of occurrences (referential segments or groupings of events) out of the total flow of all events in the stream; only the events within these selected occurrences contribute to the quality gauged and thus the meaning that arises from it. Similarly, with digestion, in certain respects anything can enter the process and pass through it (unfiltered flow); in another respect, however, certain aggregations enter as wholes and their wholeness is sustained throughout the process, while other entities—for instance those that are too large or which are unpalatable or poisonous to digestion—never enter the digestive process (filtered flow).

Another way to distinguish how the sonic stream flows in our consciousness is in the way its individual occasions are pushed out of the mental docket that we continually evaluate, or in other words, what prompts them to be pushed out of consciousness. They may be pushed out because of the pressure of new occasions going in (under the assumption that our evaluative docket has a limited capacity), or they may be pushed out on the basis of a specific condition detected in the stream (under the assumption the our evaluative docket has elastic capacity). In some respects, the continual presentation of new events in a sonic stream pushes older events out of consciousness, so that they eventually cease to influence the stream’s quality. In another respect, however, the continual presentation of new events may accumulate indefinitely (not ever forgotten) in the listener’s consciousness until something triggers their release from immediate consciousness; and in this respect they may accumulatively contribute to the gauging of quality until a condition is obtained that relinquishes them. Automatic pressure relief outflow from the docket and detected condition triggered purging outflow from the docket correspond to these two aspects of listening.

This can be imagined in terms of valves controlling flow, which are akin to parts of our digestive tract that regulate the flow of digestion. For instance, the difference between automatic pressure relief outflow and detected condition purging outflow is diagrammed in Figure 3.

Figure 3: Diagram of three types of listening flow system outflow as controlled by valves

Automatic pressure relief means that for each occasion that enters the docket, the events of some earlier occasion are forced out; that is, the docket’s outflow is controlled by a pressure relief valve (shown in Figures 3a and b). By contrast, conditional purging means that each new event is retained in immediate memory, and thus on the docket, to contribute to each new gauging of quality, until some specific condition is met, at which point all events are relinquished all at once (shown in Figure 3c). This is by no means an exhaustive account of the variety of flow systems relevant to listening.[14] It merely suggests some of the ways listening is a kind of digestion, through its varieties of temporal flow and its qualitatively nourishing potential.

Listening as Meditation

I will not dwell long on the meditative aspects of listening since they are well known and familiar to many through first hand experience. That listening is a kind of meditation derives from the fact that it can be an object of intense focus and prolonged concentration. Recently, an incident at a New York Philharmonic concert brought heightened attention to listening’s meditative function. As occasionally happens, an audience member’s mobile phone started ringing (a loud marimba sound) during the performance. In this particular case it was near the end of Mahler’s Ninth Symphony, and the ringing—coming from the fourth row—persisted for five minutes, ultimately intruding upon one of the movement’s final quiet passages. So egregious was this that, in an unprecedented move, conductor Alan Gilbert halted the performance; audience members cheered him and went into an uproar, demanding the offender be ejected from the concert for ruining the listening experience for all the rest of the attendees. The incident was reported in national newspapers, radio, and web blogs for over a week following the event.

That concertgoers were ‘baying for blood’ attests to the value placed on focus and concentration in music listening; it is an intense experience of tuning out all else and a disappointment when that fails to be accomplished.

Previously, composers have drawn attention to the meditative aspect of music listening. Boretz for example has described ‘experiencing music [as] bringing into being a singular time-space identity, received from a singular perspective of location...The psychic time and space and occasion of a music experiencing are fully contingent upon the specific coincident physical times, and physical spaces and real-world occasions within which that music experiencing occurs’ (Boretz 2002: 142). Most famously Pauline Oliveros developed the interrelated concepts of deep listening, sonic meditation, and sonic awareness (Oliveros 1990) which are the basis for a meditatively immersive approach to improvisation, which Von Gunden describes as ‘a synthesis of the psychology of consciousness, the physiology of the martial arts, and the sociology of the feminist movement,’ involving both focal and global attention (Von Gunden 1983: 105-7). Morris describes the meditative aspect of listening as an attention to qualitative experience without reference to knowledge, calling it ‘suchness’. ‘Suchness is what we perceive when there are no thoughts about perception, before we recognize something as X.’ Beyond listening, he also draws parallels between the attention-focusing aspects of musical discipline (such as breath control, bowing, scales, and even counterpoint) and the capacity for pure attention developed to advance on the path to enlightenment as prescribed by various spiritual belief systems such as Buddhism (Morris 2002: 324).

Listening as Transport

The transportive nature of listening forms in some ways the opposite of its meditative capability but derives equally from its focal nature. Specifically it derives from the extremely powerful referentiality of sound, a referentiality that is exploited to an unprecedented degree as a result of the latest technologies of sound creation and reproduction. For instance, Ashby notes the way the iPod has utterly transformed sonic literacies and listening habits, arguing that ‘recordings are now the primary way we hear classical music, especially the more abstract styles of ‘absolute’ instrumental music... mechanical reproduction [recording technology] has transformed classical musical culture and the very act of listening, breaking down aesthetic and generational barriers and mixing classical music into the soundtrack of everyday life’ (Ashby 2010: abstract). He further argues that the depictive nature of, for instance, Mahler’s music is so imagistic that it rivals that of photographs. Mechanical sound reproduction allows the depictive power of sound to be deployed in physical situations that are utterly unrelated to that which is depicted, thus realizing the transportive potential of listening: thanks to the iPod, we can soak in the soundtrack of a sunset while surfing the subway.

Besides the affectively evocative nature of traditional classical music, such as Mahler’s, the latest musique concrète, sonic art, and sound sampling exploit the nuanced level of sonic literacy that is now arising, to great transportive effect. The ability to combine sampled sounds en masse has even been developed into algorithms called soundspotting, as Michael Casey explains (Casey 2009). The internet enables almost any sound to be available from any place at any time; and the opportunities to combine these into ‘mash ups’ and hear such sonic results are legion. It might be argued, therefore, that sonic literacy is gradually displacing older types of musical literacy which dominated, for instance, in the late 19th century.

As Landy puts it, sound-based music connects life to art (Landy 2009). Yet the way in which it does this is to use sound to transport the listener to a time and place in life that is different from the time and place one is currently in when listening: the listener is transported via the sound she hears. Robert Morris’s Thunder Spring Over Distant Mountains (1973) transports the listener on a futuristic tour through Asia by way of its processed samplings of Balinese, Japanese, Korean, and Tibetan musics.

Trevor Wishart’s Globalalia (2004), a ‘29 minute piece [which] uses syllables taken from 26 different languages, to create a series of elaborate variations on the sounds of language itself’ transports the listener to every corner inside the human mouth.

In Canto di Malavita, Red Carpet, Psalmus XIII, and Gotterdammerung by Noah Creshevsky, the virtuosically dense stream of samples provides a dizzying whirlwind tour of musical styles and genres, transporting the listener at the speed of light back and forth through a series of totally separate musical situations, by virtue of the extreme affective particularity of each sample. Creshevsky calls this hyperrealisim. It transports the listener sonically in a way that would be physically impossible to achieve through actual physical transport.

Creshevsky’s Red Carpet.

Creshevsky’s Psalmus XIII.

Creshevsky’s Gotterdammerung.

Environmental sounds are used in music and sound art not only for their sonic richness but also for their transportive capability—though the two are not mutually exclusive. What is interesting is the way processes of environmental sounds suggest not only distant location but also the process of transporting through time or space. The gradual crescendo of crickets in Luc Ferrari’s Presque rien No.1 (1970) transports the listener closer and closer to the crickets or deeper and deeper into the night.

Ferrari’s Presque rien No.1.

In Mahamud Ali and the Crickets (2004), Alan Licht combines the crickets with various sampled urban sounds, transporting the listener to a highly specific hybrid soundscape utterly different from Ferrari’s.

Other tracks on Lichts’s album New York Minute (2004) are soundscapes which deliberately evoke specific locales in New York City. Eno brings us Music for Airports (1978), but Licht brings us New York without the air travel. Such listener transport is not always achieved by presenting sounds sampled from the transport destination. A passage near the end of my own Heraclitean Dreams (2008) uses mostly computer generated sounds to evoke the ‘great buzzing confusion’ of being deep inside a swamp, live with unfamiliar swarming creatures.

Different loudness levels of streams within the soundscape are meant to suggest the depth of space of a physical scene the listener might find herself in.

Listening can transport in time as well as in space. Licht (Licht 2007: 83) notes Wishart’s Viking Museum (which re-creates a lost Viking language) and Hans Peter Kuhn’s installation at the closed steelworks Volklinger Hutte (with sounds recorded when it was still operating) and Ron Kuivila’s Mass MoCA installation (re-creating sounds of the factory it once housed). Brad Lubman’s electroacoustic composition I Herd Voices, 2 (2003) ‘suggests the nostalgia of old dusty records combined with the suspense of hearing depth charges from inside a submarine’ (Mailman 2003).

Thus, self-referentially, listening can even transport one specifically to listening situations of the past.

One of the most fascinating aspects of listening as transport is the way its potential is increasingly exploited both as a result and a cause of the enhanced referential sonic literacy enabled by audio technology. As evolutionary biologist Mark Pagel explains, scientists now distinguish between physiological evolution (which operates by principles of gene selection) and cumulative cultural evolution, of which he cites language as a ‘social technology’ that allows humans to engage in social learning.[15]

Social learning enables the evolution of ideas, which persist beyond the lifetime of the individuals that produce them. The ease of use, the high fidelity, and the transportability of mechanical sound reproduction now enables the listener to partake in a kind of cumulative cultural adaptation involving sound and sonic literacy. It allows the details of sounds to persist beyond the individual lifetimes of human minds and now also beyond the obstacles of geographic proximity. We may find that mechanical sound reproduction is to musical evolution what ideas are for human evolution, because a composer or sound artist a hundred years from now will be able to employ sound samples exclusive to the early 21st century, and her early 22nd century listeners will likely recognize them in all their specificity because of their own highly developed sonic literacy. They may be able to transport sonically to our time in a way we cannot do in relation to the early 20th century.

Conclusion

Sound may be a physical phenomenon; hearing may be regarded as a physiological process involving sound. Listening, however, is an abstraction. Like any other abstraction, listening has no essence. It is best understood pluralistically, in terms of its facets. These facets are not reducible to each other. In fact, though no systematic procedures governed the selection of these particular metaphors, the complementarity of some of them was a consideration. The meditative and transportive facets of listening in some ways complement each other, by stressing immersion on the one hand and escape on the other, though each in some ways also entails the other. The digestive and recording facets—by no means mutually exclusive—stress the ephemeral verses persistent effects of listening, while neither denies the reality of the other. The adaptive and improvisatory seem to complement each other by stressing either subjugation of the will to circumstance or imposing of the will on circumstance. Yet both suggest responsiveness to circumstance. The computational nature of listening, though seemingly rigid, actually underlies, enables, and enhances the adaptive and improvisatory, while it also may operate according to either or both contrasting temporalities of the digestive and recording facets. A future study might explore such intricate complementarities and interdependencies, which are not crucial to my present purpose.

Especially in a discursive landscape that sometimes favors reductionism, what is crucial is to keep the various irreducible facets of listening in mind, to maximize listening’s experiential value. To this end, the seven metaphors encourage an actively pursued flexibility of listening, prompted by a plurality of ways to conceptualize it. Consider once again the facets of listening as discussed above: Its particular temporality and flow, its ephemerality and persistence in our consciousness reveal it as a kind of Digestion. That it is a way of precisely preserving what happened reveals it as a kind of Recording. That it involves adjustments to our thinking in order to absorb what is happening reveals it as a kind of Adaptation. That it is a way of tuning out and focusing attention demonstrates it as a kind of Meditation. That the referentiality of sound directs our consciousness to locations and times other than the ones we are in reveals how listening is a kind of Transport. That it demands a spontaneous readiness and permits agency of interpretation shows it as a kind of Improvisation. That it may involve systematic processes of parsing and calculation reveals it as a kind of Computation. Listening may be yet much else, but much of it is DRAMaTIC.

Notes

1. Reddy proposes there are two versions of the framework: ‘The major framework sees ideas as existing either within human heads or, at least within words uttered by humans. The ‘minor’ framework overlooks words as containers and allows ideas and feelings to flow, unfettered and completely disembodied, into a kind of ambient space between human heads. In this case the conduit of language becomes, not sealed pipelines from person to person, but rather individual pipes which allow mental content to escape into, or enter from, this ambient space’ (Reddy 1993: 170).

2. The urgency of metaphors for listening is not merely that of an open concept such as game, which Wittgenstein argues cannot be defined by necessary and sufficient conditions (Wittgenstein 1953). Rather, the need for metaphors for listening is demanded by the fact that it is not an externally observable phenomena like a physical action, a ritual, or a game, but instead is an internal cognitive activity: an abstraction.

3. The picture, drawn from Jastrow’s (Jastrow 1900) Fact and Fable in Psychology, can be seen as either a duck or rabbit, depending on how the viewer is prompted (or prompts him or herself).

4. With regard to music, Spitzer takes his cue from Scruton (Scruton 1999: 78) who argues that musical listening is ‘hearing sounds as music’.

6. The ears in these diagrams should not be taken too literally; music enters the mind not only as sound waves through the ear, but also through sight and physical vibrations as well—for instance consider the deaf percussionist Evelyn Glennie, for whom ‘hearing is basically a specialized form of touch.’ See Glennie, Evelyn (2008). ‘Hearing Essay,’ from http://www.evelyn.co.uk/Evelyn_old/live/hearing_essay.htm .

7. The interpreter-analyst is not necessarily distinct from the analyst-theorist or theorist-analyst, as all three activities are interdependent. For instance the theorist-analyst and analyst-theorist are potentially the same person, at different times focusing on theorizing for the sake of analysis or analysis for the sake of theorizing. Likewise analysis is persued for the sake of interpretation and vice versa.

8. Reddy also remarks: the more that conduit metaphor frames are already ingrained, the more one resists change to alternative frames. In light of this we might consider how ingrained is the notion that the neutral, objective, or abstract aspect of music is its ‘structure.’

9. ‘In psychological measurement, the individual is the measuring device; he plays the role of the pan balance, the meter stick, or the thermometer,’ as Coombs puts it in his essay on psychology and mathematics (Coombs 1983).

10. Spitzer’s discussion actually pertains not to hearing alone, but rather more generally: ‘That perception might be based on the ability to execute a technique was the burden of Wittgenstein’s (1953) famous rabbit/duck illusion...[For instance] seeing as…is a technical procedure that can be prompted. One can decide, or be instructed, to see the drawing in a particular way’ (Spitzer 2004: 9). Seeing, Wittgenstein argues, is an amalgam of seeing and thinking.

11. See also Meelberg (Meelberg 2008), who remarks that, of all the senses, hearing is most closely related to touch.

12. See also Pearsall’s account of discursive and non-discursive time in his ‘Anti-Teleological Art: Articulating Meaning through Silence’ (Pearsall 2006).

13. More literal affiliations between music and gastronomy are documented light-heartedly by Braus (Braus 2007).

Mailman, Joshua B. (2010a). Temporal Dynamic Form in Music: Atonal, Tonal, and Other (PhD dissertation). Rochester, NY: University of Rochester.

Mailman, Joshua B. (2010b). Emergent Flux Projecting Form in Ruth Crawford Seeger’s Quartet (1931). Paper presentend at the Conference of the Society of Music Theory, Indianapolis.

Mailman, Joshua B. (2011). Duality of metaphor for time and music: applications to computational-phenomenological analysis of musical form and expression. Plenary session roundtable on Metaphor at the Seventh International Conference on Music Since 1900 / Society for Music Analysis Conference, Lancaster.

Meelberg, Vincent (2009). “Sonic Strokes and Musical Gestures: The Difference between Musical Affect and Musical Emotion.” In Jukka Louhivuori, Tuomas Eerola, Suvi Saarikallio, Tommi Himberg, and Päivi-Sisko Eerola (eds.), Proceedings of the 7th Triennial Conference of the European Society for the Cognitive Sciences of Music (ESCOM) (pp. 324-327). Jyväskylä: University of Jyväskylä.

Joshua Banks Mailman graduated from the University of Chicago and the Eastman School of Music, where he completed a music theory Ph.D dissertation on temporal dynamic form. His publications appear in Psychology of Music, Music Theory Online, and Music Analysis, from which he won their 25th Anniversary award for his article on Carter’s Scrivo in vento. He has presented papers on music of Carter, Crawford-Seeger, Ligeti, Brahms, Babbitt, Schoenberg, and topics such as temporal dynamic form, narrative, electro-acoustic music, binary-state Generalized Interval Systems, and octave-equivalence with atonal melodies in the context of long-term memory. He is also active in sound art and music technology design. He has taught at the Eastman School, University of Rochester, the University of Maryland, and Hunter College, CUNY. He now teaches at New York University and Columbia University. Visit www.joshuabanksmailman.com.