"Generic" Music Representation for Aura

Roger B. DannenbergCarnegie Mellon University

Introduction

Aura is a real-time object system with a focus on interactive music. It evolved from
earlier systems, including the CMU Midi Toolkit, which included software for Midi
processing. Aura also supports Midi data, but a goal of Aura was to go beyond Midi,
enabling richer and more flexible control. To that end, Aura introduced the idea of
general messages that contain attribute/value pairs. With Aura, you can dynamically create
an instrument, a note, an object that modifies a note (for example, a vibrato generator or
an envelope), and you can send any of these a stream of timestamped, attribute/value pairs
containing high precision control information.

It was thought that this would be an improvement over Midi: more flexibility, none of
the limitations. However, experience has shown that unlimited flexibility does not make
something better in all ways. One of the advantages of Midi is that you can use standard
tools to capture, store, and play Midi data. The contrained representation and the
conventions make it easy to work with. For example, if you are not sure a keyboard is
working, you can plug it into almost any synthesizer for a simple test.

With Aura, I found myself inventing semantics and a protocol every time I designed a
new instrument. The flexibility was there, and I was able to do things that would be
difficult with Midi, but the overhead of designing interfaces and remembering them so I
could use them was too much. This document describes a fairly generic representation to be
used in the context of Aura to represent notes, sounds, and control infomation. Using the
conventions described here, it should be possible to make some general tools to assist
with music processing in Aura.

Multiple Representations

I want to support three representations:

A text representation for communication with various bits of software (I learned one of
the most popular features of the CMU Midi Toolkit was the simple text language Adagio,
which many composers generated using the programming language of their choice.)

Aura messages. The representation should translate to an attribute/value pair message
stream.

Midi. There should be a natural way to translate Midi into this representation and to at
least map a subset of this representation into Midi.

Resources and Instances

One of the most critical aspects of a representation is to decide what exactly is being
represented. I want to be able to represent sounds of various kinds and to be able to
update the sounds over time. The limitations of Midi might help make this clear. In Midi,
control changes apply to channels, so there are really only 16 objects or resources
(channels in this case) to which updates can be directed. There are a few updates that
apply to particular keys or note numbers, but even here, you are limited to 128 note
numbers. I want to be able to associate each sound with its own identifier so that the
sound can receive individual updates.

Is a sound structured? A sound can have many parameters. Notes usually have pitch
and loudness, but there are many other possibilities. When sounds get complex, there is a
tendency to describe them hierarchically, e.g. a note can have a pitch vibrato and an
amplitude vibrato. Each vibrato can have a phase, frequency, and amplitude. This approach
can lead to a hierarchical naming scheme as in Open Sound Control, such as
"note/ampvib/freq" or "note/freqvib/freq". In Aura, vibrato objects
can be considered as separate entities and named directly. In fact, a collection of notes
can share a vibrato object. The variations are endless.

Alternatively, sounds can be "closed" abstractions. All updates are sent to
the sound, and it is the sound's resposibility to forward the updates as it sees fit.
Continuing with the example, you might set the "ampvibfreq" attribute and the
sound would in turn set the "frequency" attribute of its amplitude vibrato
object. This object might be an internal object managed by the sound or a shared object
calculating vibrato for many sounds.

My leaning right now is toward the closed abstraction approach. This eliminates the
complexities of a hierarchical name space and the danger of exposing the internals of
sounds to updates.

Multiple Parameters

Another issue is the problem of multiple parameters for sounds, given that Aura
messages typically convey one attribute/value pair. Open Sound Control sends packets of
atomic updates, and Aura had this feature in a previous version, but it turned out to be
very difficult for clients to construct packets through any kind of simple interface, and
packets make filters and mappers more complex.

The alternative is to simply send sequences of updates in the form of attribute/value
pairs. It helps to have some sort of delimiters, particularly because we typically want
updates to apply to a particular sound, yet attribute/value pairs do not contain a
"tag" or target field that would say which sound is to be updated. The way in
which a sequence of updates is bracketed by other messages is an important convention in
the representation.

Synchronization and Atomicity

Since Aura messages set a single attribute to a simple value (typically a float,
integer, string, or Aura object reference), an important question is how to make sure that
groups of attributes are updated simultaneously. The classic version of this problem is to
insure that filter coefficients are updated simultaneously to avoid unstable regions of
the parameter space. There are at least three ways to handle this problem:

Typically a sound or note is created using a sequence of attribute updates. By
convention, the last of these is the "gate" attribute which actually starts the
sound. Updates do not really need to be synchronous and atomic until the sound begins (in
most applications).

Timestamps allow the sender to specify synchronous updates, and the Aura scheduler
cooperates by delivering all messages with a given timestamp atomically. What this means
for synthesis is that audio generation is stopped, messages are delivered, and then audio
generation continues. The implementation is quite elegant and simple: audio generation
itself is scheduled by timestamped messages. The Aura scheduler insures that processing
occurs in timestamp order. The only exception is that if a message arrives from another
process and its timestamp has expired, the message is delivered immediately.

Within a process, messages are processed synchronously and non-preemptively. Thus, if a
bank of filters are being controlled by a single object located in the same Aura zone
(process), then that object can update all filter coefficients atomically without any
special precautions. A client in another zone wishing to deliver a set of updates
atomically can create a proxy in the destination zone and use the proxy to deliver the set
of messages. This may require some extra programming to create the proxy.

The Aura Message Representation

Channels

Music information can exist in many parallel streams representing Midi channels,
instruments, voices, sections, etc. We could simply direct each stream to a different
object, but ultimately we want to be able to store streams in a single file or direct them
to a single object, so we need a representation for multiple streams. The "chan"
attribute serves to name a stream. The value is an integer (32 bits), allowing a large
number of channels.

Whenever the channel attribute is set (i.e. a "set 'chan' to value message
is sent), the following attribute/value pairs apply to the channel or to a specific sound
associated with the channel. Channels can have attributes. By convention, setting an
attribute for a channel sets that attribute for all sounds currently active on the
channel. The attribute may or may not apply to future sounds created on that channel. (It
is also up to the channel whether to do anything with the attribute/value pair, and it is
up to the sounds to decide whether to do anything if they receive the pair, so it does not
seem wise to try to control the semantics of attribute/value updates too rigidly.)

Keys

Within a channel, sounds are allocated and named by setting the "key"
attribute. The name comes from the notion of keyboards, but there is not necessarily a
one-to-one mapping from key number to pitch. Instead, the key numbers 0 through 127 act as
Midi keys which imply pitch, but key numbers above 127 are simply tags used to identify
sounds. In this way, we can have 32 bits to name sounds within a channel. This is enough
to allocate a separate name for each sound or note on the channel in all but the most
extreme cases.

By convention, setting the "key" attribute allocates a sound on the current
channel. Successive attribute/value pairs apply to the newly allocated sound or note.

Gates

In Midi, the keydown message that allocates a note also starts it playing. In Aura,
setting the "key" attribute only allocates a sound or note. To make it play, you
set the "gate" attribute, which normally is a floating point number in
[0...127], representing a Midi-like velocity or amplitude. If the gate value is less than
or equal to zero, the message is roughly equivalent to a Midi noteoff message. In other
words, the note or sounds begins to decay and eventually stops sounding. The gate may be
changed to any non-zero value to accomplish volume control changes, but sounds may choose
to ignore these changes. (Otherwise, every sound would have to include some additional
logic to detect changes, route them to an envelope and use the envelope to control gain in
some fashion.)

Note that the "gate" in neither in dB nor is it linear. Is this a bad idea?
I'm not sure. Instruments should call a conversion function to make it easy to change the
interpretation of the gate value.

Duration

To accomodate the notelist style of score specification where notes are given a
duration attribute at the beginning rather than a noteoff update, you can set the
"dur" attribute to a floating point value representing seconds of duration. It
is up to the note whether the duration is interpreted as the point at which decay begins
or the point at which the note becomes silent, but the convention will be that of Music
N, that is, when the note becomes silent. If the "dur" attribute is set,
there is no need to eventually set the gate to zero. Notes and sounds that do not
otherwise know what to do with duration can simply schedule a "set 'gate' to 0 after duration"
message to accomplish the effect.

Pitch

Pitch is specified using a floating point number of half steps corresponding to Midi
integer key numbers. In other words, 60 is middle C, and 60.5 is a quarter tone above
middle C.

Other Attributes

Any number of other attributes can be implemented. For example, "bend" might
be an additive offset to "pitch" to facilitate pitch bend specification.

Examples

A typical sequence of messages to turn on a note in a Midi-like fashion is the
following:

set 'chan' to 1
set 'key' to 60
set 'gate' to 100

To play this same note for a known duration, use the following:

set 'chan' to 1
set 'key' to 60
set 'dur' to 0.85
set 'gate' to 100

A more advanced sequence where the "key" attribute serves as a tag is the
following:

set 'chan' to 10
set 'key' to 1205
set 'pitch' to 60.1 --10 cents sharpset 'pan' to 0.5 --pan to the middleset 'brightness' to 0.3 --set any number of additional parametersset 'gate' to 95 --and finally turn on the note

To modify the note, you might send additional updates, for example:

set 'chan' to 10 --only necessary if 'chan' was set to another value
set 'key' to 1205 --only necessary if 'key' was set to another value
set 'pan' to 0.6 --now change as many attributes as you want

To end the note, set the "gate" attribute to zero:

set 'chan' to 1
set 'key' to 60
set 'gate' to 0

The Text Representation

The text representation is based on "Adagio," the text-based score language
in the CMU Midi Toolkit. The basic idea is that each sound event or note is represented by
a line of text (some extensions allow multiple events per line). There are some
abbreviations for common attributes. Some examples follow, corresponding to the
message-based examples above. Note that in the text form, notes are always specified using
durations to avoid having to match up note beginnings with note endings. In Adagio,
various letters are used to indicate pitch ("A" through "G") and
duration ("S", "I", "Q", "H", "W"),
leaving the rest of the alphabet to indicate attributes, which include the following:

Translated, this means: "Using channel 1, and key 60, and a duration of 0.85
seconds, set the gate to 100."

Adagio was mostly limited to Midi, so there was no need for an extended set of
attributes. For Aura, the syntax is extended to allow attributes as follows:

V1 C4 Q Lmf -pan:0.5 -brightness:0.3

In this example, the "pan" attribute is set to 0.5. The syntax is simple: a
leading dash ("-") and trailing colon (":") serve to delimit the
attribute name, and the value follows. This example uses standard Adagio syntax for pitch
(C4) duration (Q) and loudness (Lmf).

If the "key" attribute is not specified, the line is a channel update
message, e.g.

V3 -bend:0.21 T21.1

sends a "bend" attribute to channel 3 at time 21.1.

Tempo and Beat Representation

(This section is not yet complete)

Tempo and beats are important aspects of music representation. My earlier work
more-or-less ignored this problem, but this makes interfacing to notation programs and
sequencers difficult. It should be possible to:

Encode beats in a stream of Aura messages (for synchronization to external Midi devices
and for interfacing to notation programs)

Express a time map to be applied to a stream (for text-based notation and composition)

Encode both tempo and beats (to provide for tempo variation control in real time)

The attribute "beat" will encode beat position as a floating point value,
allowing arbitrary subdivisions of the beat as opposed to the fixed divisions of the Midi
Clock message. The attribute "tempo" will specify a new tempo. Aura messages
carry timestamps. Together with "beat" and "tempo" attributes,
timestamps can be used to recover a tempo map from a sequence of messages.

Text-based scores will ordinarily treat time as beats, which means that timestamps of
messages must be mapped into beat values for the scores, and a separate section of the
score file will allow the specification of a tempo map. It is expected that the internal
representation of a score will use beats for times and durations, and these will be
translated on-the-fly into Aura timestamps using the Aura scheduling mechanisms.

Different channels might use different tempo maps, but I am not sure how to indicate
this. We do not want to replicate a tempo map for each channel.