Playing by eye

Logic will
get you from A to B. Imagination will take you
everywhere. (Albert
Einstein)

Reason is a
whore, surviving by simulation, versatility,
and shamelessness. (E.
M. Cioran)

Introduction

This dissertation deals with the culturally informed relationships
between the visual and the aural in music making, with a particular attention
for digital technologies. Using programs like sequencers, virtual instruments
and wave editors, music makers are faced with a specific representation of sonic
events. Once stated that music software is based to a great extent on visual
indices, I investigate the mechanisms which stem from the relationships between
this perceptive organization and music production, focusing on the role of digital
instruments as means for a supposed form of "democratization".

Historically and culturally derived hierarchies of the senses

First of all I want to show how the aural and the visual are
interrelated in ways which reflect the cultural meanings and functions assigned
to these faculties. As some interesting anthropological researches illustrate,
our hierarchy of the senses, with sight at the top and hearing trailing behind,
can’t be considered neither a natural predisposition nor a universal of culture.
Steven Feld (1990), Paul Stoller (1989) and Alfred Gell (1995), among others,
contrast the western under-rating of the sense of hearing with its importance
respectively in Kaluli, Songhay and Umeda cultures. The main point is that there’s
a link between the environment and the organization of sensibility, with consequences
in the domain of cognition and therefore in the way the world is perceived and
events are described. Each sensorial faculty is accorded a determined value
in relation to the foundation of experience and the social construction of knowledge.
For instance, the Suya of the Brazilian Matto Grosso "deem keen hearing
to be the mark of the fully socialized individual. The Suya term ‘to hear’ […]
also means to understand, while the expression ‘it is in my ear’ is used by
the Suya to indicate that they have learned something, even something visual
such as a weaving pattern. Sight, in fact, is considered by the Suya to be an
anti-social sense, cultivated only by witches" (Classen 1993, p. 9).

On the other hand, our society can brag about essays like psychoanalyst
Imre Hermann’s Perversion and hearing world (Hermann 1970), where we
are taught that most of "those individuals whose psyche is governed by
the aural" are predictably exhibitionist, fetishist, voyeur, kleptomaniac,
pederast, and so on.

Besides a variance between different cultures, we can also
notice a diachronic change within the same civilization. Just think about the
shifting connotation of the dictum "Verba volant, scripta manent":
in the past it used to mean that the written word is sterile and dead as a stone,
while the spoken word can spread its wings and fly. Nowadays that connotation
has been radically reversed: the spoken word – that is, the oral/aural – is
ephemeral, while the written word – the written/visual – is stable and substantial
(see Borges 1978). This is in tune with the founding role we assign to written
words, charts, photographs, clothing, numbers, fingerprints, etc. in the definition
of reality.

*** *** ***

Nevertheless, no musical or visual event is "pure"
enough to exclude the other four senses and overall to exclude the thick network
of linguistic and behavioural codices which interweave every human activity
(Fabbri 1996, p. 180). At this point, we come up with this query: is there any
specificity in aural perception, as opposed to visual one?

Adorno and Eisler (1969) stress the fact that the ear, differently
than the eye, is always open. This means that it is a vulnerable receptor, both
"passive" (it cannot decide when and what to perceive) and, at the
same time, always "active" (or better, "activated"). Besides,
the ear receives stimuli from every direction, and doesn’t need, as opposed
to the eye, to be focussed towards the source of information; hence the ear
keeps an "archaic sense of participation" with the surrounding environment,
rather than trying to control it by means of zooming in on specific objects.

Though revealing, what the two scholars argue should be cleaned
out of its hinted claims of universality, and instead be contextualized within
a certain culture and at a certain time. As we have seen, aural perception varies
according to culturally constructed norms and practices; moreover, as shown
by social psychology, perception is also an active process of selection (see,
for example, the distinction between sound and noise). In brief, what Adorno
and Eisler bring forward is that, in our society, the ear, being less directional,
is inclined to be also less selective and less subjectively organized than the
eye (ibidem).

Conversely, the eye enjoys classifying and systematizing qualities
which are tuned in to the most distinctive instances of Western rationalism;
furthermore, directionality implies also an intention, and thus a will of control
over the environment.

This said, if music is also about organized sounds, we assume
that there’s a need for a system of reference when making music. This
system doesn’t necessarily refer only to sound qualities, because the latter
are always structured within cultural patterns, which comprehend also perceptive
inputs different than aural ones plus mental representations in general.

A visual representation of sound

Music software faces us with a determined representation of
sonic events, where the signs we can see on the screen don’t exactly "look
like" the signs that we can hear from the speakers. As we have seen, this
representation is not necessary at all, but culturally derived, the more that
we could imagine different ones – in fact we have different ones, like the music
score.

It is very likely that this representation bears some effects
on which parameters are emphasized. Even at first glance, music software gives
prominence to certain aspects like wave shape, sound treatment, dynamics, rhythmic
intertwine between voices, accents, looping and repetition, texture, timber,
etc. Moreover we get a subtler view of details like phase, clips and clicks,
fades, frequency specter and panning. Then, we can open plug-ins which consist
in virtual instruments, effects and sound wave analyzers. Finally, we have MIDI,
which reduces a musical event into some of its measurable traits: pitch, duration,
velocity, amount of pedal of expression, etc.

If we open an audio file on a wave editor, we can observe an
image which looks familiar to us, that is a Cartesian coordinate system. This
mathematical function represents the digitalized sound wave, with time on the
axis of abscissas and dynamics (measured in dB) on the axis of ordinates.

Sequencers have a similar approach; on the abscissas again
we have time, while on the ordinates we find discrete events called audio and
MIDI tracks. Every audio track contains another Cartesian system inside, that
is the sound wave. We can notice that this display highlights the intertwine
between events.

In a sequencer, time is divided into equal sections, consisting
in multiples or fractions of the unity, measured in b.p.m (beats per minute),
so that a grid organizes spatially the domain of the song. An audio-MIDI project
looks like an empty space to be filled with audio samples or midi phrases. Sampling
Timothy Warner (2003, p. 26), "the visual nature of the computer screen
presents musical material as simple blocks and, as a result, encourages the
production of pieces with additive, rather than organic structures". Hence,
"most change is produced by addition or subtraction of timbres, not through
organic growth or musical development" (ibidem, p. 45), and - I
would add - in an analytic fashion, as expressed in particular by cut &
paste technique.

Continuity, again, is obtained through repetition of events
– or blocks -, which in its turn can refer to single samples (e.g. bass or snare
drum, etc.), loops (e.g. drum groove) or entire sections (e.g. verse or chorus).

This "space" is virtual and can be conceived, quoting
Pierre Lévy (1995), as a process of transformation from a modality of
being into another one. What I want to bring to attention is that this virtual
domain is an abstraction, and that sight is our primal way to read this space
and move within it.

For instance, dynamics cues are fundamental organizer factors
of visual perception; they are recognized as hints in building loops and in
beat-matching, which is fundamental in order to put events in time or to use
samples from other records. This way, in a drum loop dynamics picks are instant
cues for bass and snare drum, and consequently for recognizing beats and measures.

This phenomenon of formalization reaches its utmost in a MIDI
sequencer, where musical events are reduced to rectangular bricks, which might
remind us of those slide-rules we used in our first encounters with mathematics.
Moreover, MIDI sequencers are generally controlled though a keyboard, which
works both as a hardware device and as a virtual interface; needless to say,
this fact, notwithstanding the pitch wheel, encourages a particular organization
of pitch events, the same Max Weber considered as being one of the characteristics
of Western music rationalization (Weber 1921).

Metalinguistic functions of visuality

Now I would like to examine some of the functions of visuality
as a metalanguage in music making.

Music makers and consumers need means through which storing
and transmitting musical knowledge, in order to be able to share it. This becomes
crucial in some occasions:

in educational practices: even if many popular musicians say that they’ve
learnt to play by ear, they’ve also probably made use of visual tools like
tablatures, pictures, diagrams, fingerboard stamps, live bands, videos, etc.
- not counting written instructions (see Bennett 1980);

in performance and sound recording documentations.

Both verbal language and visual communication can serve these
purposes. Here it comes again the issue of the "relative reliability we
ascribe to aural information compared to visual" (Thorn 1996): in order
to assess that something is «real» or «true», we need to see it.

For instance, in the description of their activities, musicians,
producers and engineers often use metaphors belonging to the visual arts:

"I’ve always described my job as painting
a picture with sounds; I think of microphones as lenses. Engineers is such
a wrong term for music mixing, really" (Geoff Emerick, in Massey 2000,
p. 84).

"The more you know, the greater the palette
of colors you have to choose from" (Nile Rodgers, ibidem, p.
185)

In this case, verbal communication serves as a metalanguage
on a second level: it speaks about visual, which in its turn "speaks about"
music. This way, we have a meta-metalanguage.

Mediation

Taking for granted that music making is generally a collective
process (see in particular Sorce Keller 2003), implying a more or less hierarchical
division of labour, a successful communication between the people involved is
a primary condition of existence for a team work, at least from a functional
point of view. This task requires a language that can be understood by people
who have different competences and training (that is, musicians in its strict
sense, engineers, producers, A&R, etc.); hence it’s not surprising if this
language is very often borrowed from visuality, at least for two reasons: (a)
this kind of discourse is shared at a more general level of competence (see
Stefani 1987) and (b) it has a strong, even though metaphorical, structuring
quality.Moreover, this metaphorical feature can become a further stimulus
for creation.

One of the most common complains about digital technologies
is that they are so powerful and versatile that they drop you alone with the
entire burden of choice on your shoulders. In effect, infinite copying with
no deterioration of quality and non destructive, software-based editing are
some of the characteristic of digital production (see Warner 2003), that means
also less physical limitation to production: "it’s not what you can do
that counts, but what you choose to do" (Eno 1996, p. 394). In this regard,
Leonard B. Meyer (1989) points out the role of constraints, being them explicit
or not, as a conceptual prerequisite for creation: "without cultural constraints,
memory is emasculated by the momentary" (ibidem, p. 349). Therefore,
also ideas or theories coming from extra-musical fields can help the music maker
building solid creative strategies within a guiding and inspiring symbolic network,
supplying values and suggesting cultural topics and instruments.

Potential and complexity

In fact there seems to be a direct relationship between the
potential of an instrument and its user complexity. For instance, George Martin
said that the Beatles’ Sgt. Pepper wouldn’t have been as good had it
been recorded in 24-track: as the saying goes, "necessity is the mother
of invention". […] Or also, Tony Visconti stresses that extrinsic limitations
of the options available can help: "David Bowie and I have discussed this
many times, that having less options in mixing is a positive thing. […] It would
keep us going in that direction – we wouldn’t deviate from that sound"
(in Massey 2000, p. 143).

At the same time, software manufacturers are concerned with
providing powerful tools with user-friendly interfaces, continuously mediating
between complete features and accessibility. What emerges is a concern for functionality:
in order to render the gear practical for the users’ needs, manufacturers have
to anticipate and provide for its possible uses; this way the resulting tool,
in a certain sense, takes into account also the consumers’ needs.

However, a deterministic view, asserting that technology moulds
unidirectionally its users, is untenable, as instruments are created also (or
re-created) by users in the act of playing them: "the ability of the consumer
to define, at least partially, meaning and use of technology is an essential
assumption and theoretical point of departure" (Théberge 1997, p.
160). This happens because every planned structure implies a certain degree
of choices among different alternatives (see Middleton 1990). Still, the planned
uses and in particular the structuring character of the tool, referring to more
general cultural issues, have effects that can’t be overlooked.

About the psychology of the computer/user interface, Richard
Thorn (1996) points out that "Icon-based, rather than word-based interaction
is believed to be more 'user-friendly', permitting immediate, almost intuitive
response. But technology and [visual] metaphors reflect only one cultural view
of the world, and why should intuitive response be limited to a visual stimulus?"

On the other hand, if for instance we consider CSound, this
program is said to be capable of producing any sound, symbolizing Western abstraction
at its extremes. It’s not a coincidence if text-based programs are more diffused
among academic musicians. Nonetheless, the potential isn’t just something that
belongs to the machine, as it raises only from the relationship with its user.

Storing sounds

Another function of visual representation is documenting the
operations required to produce determined sounds. Indeed it is sometimes necessary,
or just practical, to fix sound-making procedures, in order to be able to replicate
them at will. There are lots of technical or didactic books which make use of
pictures, drawings and diagrams which show the correct posture of the body while
playing a certain instrument, how to place microphones in order to record a
determined instrument and to obtain a certain nuance from it, the right position
of monitors in a recording studio, equalization-curves that illustrate how to
get a particular filtering effect, stylized sketches of scratch techniques,
and so on. Musicians themselves use these visual means. I can quote, by way
of illustration, Tony Visconti: "I take photographs of the mic placement.
I have a series of photos from over the years – they’re like photographs of
my drum sounds" (Massey 2000, p. 149).

Irreducible musical conducts

Though, some specific musical conducts seem to be irreducible
both in visual terms or through verbal or mathematical language; I could mention
concepts like "feel", "swing" or "groove", whose
content is fuzzy in theory and yet clear in practice, at least for people acquainted
with a certain musical style. It’s like when Saint Augustine said, referring
to time: "What is it? If nobody asks me, I know. If I’m asked, I ignore
it".

In the domain of artificial intelligence, software programmers
are concerned with providing computers with all the abilities needed to make
them play like a human – or to reproduce every human gesture. This purpose requires
a formalization of these empirical conducts, in order to digitalize them. Those
aspects which haven’t been properly digitalized yet, and that can therefore
be considered, though only at the present time, irreducible, are some of the
ones which actually make the difference between the human presence and the machine,
especially in live performances (see Warner 2003, pp. 41-43).

An apparent paradox is that these "irreducible" characters,
that I’ve been defining as intrinsically musical and therefore should lead us
onto a more specialized musical knowledge, instead seem to refer to a more general
competence, exactly to what Gino Stefani would call a General Code, that
is "perceptive and logical schemes, anthropological behaviors, basic conventions
according to which we perceive and interpret every experiences, hence also aural
ones" (Stefani 1987, p. 18). Mistaking musical abilities with performing
skills and theoretic knowledge, rests on misunderstanding and underrates the
importance of listening. As John Blacking wrote: "Latent ability is rarely
recognized or nurtured, [while] the creation and performance of most music is
generated first and foremost by the human capacity to discover patterns of sound
and to identify them on subsequent occasions" (Blacking 1973, p. 9).

From another point of view, we could say that a properly musical
event, when it is irreducible to the spoken language or to logically-organized
languages (e.g. the binary code), is one of the few conducts in our culture
which preserve a marked pre-logical character. I can add that this character
is strictly connected with body movement and flowing of time.

About the importance of the body, from a poietic perspective,
just consider the role and implications of gesture while playing an instrument,
compared to the abstraction which connotes programming a machine.

About the flowing of time, I’m not going to approach here such
a problematic topic, but just give some cues. Time is analogue for definition:
digitalization presupposes a divisibility in a discrete quantity of intervals,
or samples – which is in contrast with the idea of an uninterrupted dimension,
being it linear of circular. But if we don’t consider the time as a continuous
flowing, we risk to get trapped in William James version of Xenon’s paradox,
according to which flowing of time, is impossible. For instance, if we imagine
present as a point, that is without extension, it doesn’t exist (Borges 1978).
Hence the present is only an abstraction and not a spontaneous actuality of
our conscience, while what we experience is a continuum – as opposed to discrete
sampling.

The point is that, according to the scheme I’ve drawn thus
far, both logic thinking and these archetypical conducts seem to refer
to the same cultural level of competence. This convergence should call attention
to a contradiction in the heart of our culture, between rationality and "something
else", that is the theme of a permanent debate (see e.g. Touraine 1992).
To a certain extent, this unmarked region could be identified with what we call
"the body", as it is implicit in those irreducible musical events,
which are in their turn often connected with corporal gestures and with a presence
in space. So, a current definition of «body», conceived more in terms of "experience
of sensuousness" (see Peter Wicke 1987) than of a physiological configuration,
could be «the human that can’t be digitalized».

It comes to my mind Philip Tagg’s laboratory experiment on
TV music which enlightens the contradiction between the progressive connotation
of the verbal and visual contents and the stereotypes carried by music: "it
appears that music in our culture, its digital technology notwithstanding, can
categorize shared subjective experience of and relation to our social and natural
environment at deeper, possibly more ‘archaic’, levels of consciousness than
visual and, more notably, verbal symbols. […] Such asylums of nonverbal symbolism
may be psychosocial necessities in a culture whose ideology of knowledge so
one-sidedly invests certain symbolic systems, notably numbers and words, with
great power and status as legitimate carriers of knowledge while banishing others
to the freaky realms of ‘Art’ or ‘entertainment’ — the fact that this presentation
about music is mostly words illustrates that point quite clearly"
(Tagg 1989, p. 17)

Summing up, visualization, notably through digitalization,
represents an attempt to reduce sonic events within a logically organized structure.
In this perspective, those musical elements which preserve their irreducibility
can be considered a sort of archetypical residue – or stronghold, withstanding
the supremacy of logical thinking.

Democratic claims

Now I’d like to examine how these digital technologies partly
redraw the social definition of musicianship, in connection with their presumed
democratic function.

First of all, we can detect a wide spread of computer hardware
and software dedicated to music making. More in detail, three trends are documented:
(a) a progressive increase in the consumption of computers, also for private
use; (b) file sharing practise, through which anyone who owns a computer and
a fast Internet connection can download a great variety of software without
any expense; (c) relative low costs of a basic home recording studio equipment.
In general, we can take note that we are in presence of a diffused familiarity
with computers, especially in the new generations.

A significant facet is the progressive hybridization between
production and consumption: producers of cultural objects, in the processes
of music-making, consume (a) gears – that is the means of production -, (b)
educative tools and services, (c) techniques, (d) mass media, etc. This aspect
is parallel to what happens with devices like samplers, synthesizers and the
likes, that is when musicians become also consumers of pre-recorded sounds.
Quoting Paul Théberge, recent innovations "alter the structure of
musical practice and […] place musicians and musical practice in a new relationship
with consumer practices and with consumer society as a whole" (Théberge
1997, p. 3).

One of the most important aspects is that all these processes
pass through the same medium, that is the computer. Not incidentally, this evidence
is often recalled to support communitarian claims, as it offers a common basis
in terms of sub-cultural capital (see Thornton 1995).

This way, in a certain sense we have an audience made of music
makers. Even if not professional, people regard themselves less as "the
audience" than as equal to "the artists" and taking part in the
same community. The more, they also manipulate music and discourses about music,
by means of editing compilations with WinAmp or VideoLan, mixing tracks with
Traktor DJ Studio or Atomic Virtual DJ, sampling and editing loops with Steinberg
WaveLab or Sony Sound Forge, composing songs with Acid or Cakewalk, sharing
music files in the Internet through Kazaa or eMule, talking about music in forums
and user groups as experienced musicians rather than just as consumers.

Getting back to the main argument of this paper, about the claims that digital
instruments are means for a form of ‘democratization’, it’s time for asking
ourselves if it’s precisely this reliance on visuality what actually makes music
making more accessible in a society where the eye is more trained than the ear.

The reason interfaced music software is so popular could be
viewed as a dialectic relationship between two contrasting forces. As I’ve illustrated,
there’s a connection between visuality and accessibility. Someone could raise
the objection that also music scores rely on visuality; the difference between
the two is that sequencers and the likes can be used drawing on a general competence
rather than on a specifically musical one. Moreover, this software emphasizes
parameters whose aware employment needs less music-specific abstraction, like
dynamics or rhythm patterns rather than harmony. As a consequence, these visual
indices can help people who haven’t been taught to play an instrument. Finally,
and most of all, digital tools can sound: even if you start making choices
according to what you see, you can always test these choices in real time through
the speakers. This feature is very relevant as listening practice, though subject
to distinctions (see Bourdieu 1979, Thornton 1995), is certainly more diffused
than performing abilities. As John Blacking wrote, our "society claims
that only a limited number of people are musical, and yet it behaves as if all
people possessed the basic capacity without which no musical tradition can exist
- the capacity to listen and to distinguish patterns of sound" (Blacking
1973, p. 8).

Hence we have a particular combination of aural and visual
qualities: on one side, visuality as abstractive and organizational skills;
on the other, aural comprehension of a style through a listening practice:

VISUAL

AURAL

abstract

empirical

deductive

inductive

poietic

aesthesic

analytic

synthetic

aware

involved

(de)structuring

functional

Apparently incompatible, the two terms of this dichotomy instead
can help us to explain the wide spread of these technologies and their claimed
democratic nature: they weave together the threads of creation and listening
and at the same time they enhances two qualities of popular competence: the
structuring skills of sight and the ability to listen.

Classen, Constance (1993) Worlds of Sense: exploring the
senses in history and across cultures (London: Routledge)

Eno, Brian (1996) A Year With Swollen Appendices (London:
Faber and Faber)

Fabbri, Franco (1996) Il suono in cui viviamo (Milano:
Feltrinelli)

Feld, Steven (19902) Sound and Sentiment: Birds,
Weeping, Poetics, and Song in Kaluli Expression (Philadelphia: University
of Pennsylvania Press)

Feld, Steven (1994) From Ethnomusicology to Echo-muse-ecology:
Reading R. Murray Schafer in the Papua New Guinea Rainforest, in The Soundscape
Newsletter, 8:4-6

Foucault, Michel (1966) Les mots et les choses (Paris :
Gallimard)

Hermann, Imre (1970) Perversion und Hörwelt, Psyche,
Stuttgart

Gell, Alfred (1995) The language of the Forest: Landscape
and Phonological Iconism in Umeda, in Hirsch, E. & M. O'Hanlon (ed.),
The Anthropology of Landscape: Perspectives on Place and Space (Oxford:
Clarendon Press)

Williams, Alan (2004) Science Fiction Double Feature: The
impact of the computer monitor on the process of the digital audio workstation,
paper for On the Right Track/Sur la bonne piste, Carleton University,
Ottawa, Sunday, 16th of May 2004