"Linux Gazette...making Linux just a little more fun!"

Audio Processing Pipelines

For decades experienced Unix users have employed many text processing
tools to make document editing tasks much easier. Console utilities
such as sed, awk, cut, paste, and
join, though useful in isolation, only realise their full
potential when combined together through the use of pipes.

Recently Linux has been used for more than just processing of ASCII
text. The growing popularity of various multimedia formats, in the
form of images and audio data, has spurred on the development of tools
to deal with such files. Many of these tools have graphical user
interfaces and cannot operate in absence of user interaction. There
are, however, a growing number of tools which can be operated in batch
mode with their interfaces disabled. Some tools are even designed to
be used from the command prompt or within shell scripts.

It is this class of tools that this article will explore. Complex
media manipulation functions can often be effected by combining simple
tools together using techniques normally applied to text processing
filters. The focus will be on audio stream processing as these
formats work particularly well with the Unix filter pipeline paradigm.

Sound Sample Translator

There are a multitude of sound file formats and converting between
them is a frequent operation. The sound exchange utility sox
fulfills this role and is invoked at the command prompt:

sox sample.wav sample.aiff

The above command will convert a WAV file to AIFF format. One can
also change the sample rate, bits per sample (8 or 16), and number of
channels:

sox sample.aiff -r 8000 -b -c 1 low.aiff

low.aiff will be at 8000 single byte samples per second in a
single channel.

sox sample.aiff -r 44100 -w -c 2 high.aiff

high.aiff will be at 44100 16-bit samples per second in stereo.

When sox cannot guess the destination format from the file
extension it is necessary to specify this explicitly:

sox sample.wav -t aiff sample.000

The "-t raw" option indicates a special headerless format that
contains only raw sample data:

sox sample.wav -t raw -r 11025 -sw -c 2 sample.000

As the file has no header specifying the sample rate, bits per sample,
channels etc, it is a good idea to set these explicitly at the command
line. This is necessary when converting from the raw format:

sox -t raw -r 11025 -sw -c 2 sample.000 sample.aiff

One need not use the "-t raw" option if the file
extension is .raw, however this option is essential when the
raw samples are coming from standard input or being sent to standard
output. To do this, use the "-" in place of the
file name:

sox -t raw -r 11025 -sw -c 2 - sample.aiff < sample.raw

sox sample.aiff -t raw -r 11025 -sw -c 2 - > sample.raw

Why would we want to do this? This usage style allows sox to
be used as a filter in a command pipeline.

Play It Faster/Slower

Normally sox adjusts the sample frequency without altering
the pitch or tempo of any sounds through the use of interpolation. By
piping the output of one sox to the input of another and
using unequal sample rates, we can bypass the interpolation and
effectively slow down a sound sample:

The input file sample.aiff is converted to 44.1kHz samples,
each two bytes in two channels. Thus two seconds of sound is
represented in 44100x2x2x2 = 352800 bytes of data which are stripped
off using "head -c 352800". This is then converted
back to AIFF format and stored in twosecs.aiff

Here we invoke a child shell that outputs raw samples to standard
output from two different files. This is piped to a sox
process executing in the parent shell which creates the resulting
file.

Desktop Sound Output

Sounds can be sent to the OSS (open sound system) device /dev/dsp
with the "-t ossdsp" option:

sox sample.aiff -t ossdsp /dev/dsp

The sox package usually includes a platform-independent
script play that invokes sox with the appropriate
options. The previous command could be invoked simply by

play sample.aiff

Audio samples played this way monopolise the output hardware. Another
sound capable application must wait until the audio device is freed
before attempting to play more samples. Desktop environments such as
GNOME and KDE provide facilities to play more than one audio sample
simultaneously. Samples may be issued by different applications at
any time without having to wait, although not every audio application
knows how to do this for each of the various desktops. sox
is one such program that lacks this capability. However, with a
little investigation of the audio media services provided by GNOME and
KDE, one can devise ways to overcome this shortcoming.

There are quite a few packages that allow audio device sharing. One
common strategy is to run a background server to which client
applications must send their samples to be played. The server then
grabs control of the sound device and forwards the audio data to it.
Should more than one client send samples at the same time the server
mixes them together and sends a single combined stream to the output
device.

The Enlightened Sound Daemon (ESD) uses this method. The server,
esd, can often be found running in the background of GNOME
desktops. The ESD package goes by the name, esound, on most
distributions and includes a few simple client applications such as:

esdplay - plays sound samples stored in one of the
more popular file formats (WAV, AU, or AIFF)

esdcat - submits raw sound samples to the server.
This tool is a natural fit for terminating a pipeline of sound
filters.

This command will play the first second of a sample via ESD:

sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 | esdcat

One can also arrange to play samples stored in formats
that ESD does not understand but can be read by sox:

sox sample.cdr -t raw -r 44100 -sw -c 2 - | esdcat

In some cases samples can sound better when played this way. Some
versions of ESD introduce significant distortion and noise when given
sounds recorded at a low sample rate.

The Analog RealTime Synthesizer (ARtS) is similar to ESD but is often used
with KDE. The background server is artsd with the
corresponding client programs, artsplay and artscat.
To play a sample:

sox sample.cdr -t raw -r 44100 -sw -c 2 - | tail -c 352800 |artscat

Both ESD and ARtS are not dependent on any one particular desktop
environment. With some work, one could in theory use ESD with KDE and
ARtS with GNOME. Each can even be used within a console login
session. Thus one can mix samples, encoded in a plethora of formats,
with or without the graphical desktop interface.

Music as a Sample Source

Having covered what goes on the end of an audio pipeline, we should
consider what can be placed at the start. Sometimes one would like to
manipulate samples extracted from music files in MP3, MIDI, or module
(MOD, XM, S3M, etc) format. Command line tools exist for each of
these formats that will output raw samples to standard output.

For MP3 music one can use "maplay -s"

maplay -s music.mp3 | artscat

The music.mp3 must be encoded at 44.1kHz stereo to play
properly otherwise artscat or esdcat will have to be
told otherwise:

Alternatively one can use "mpg123 -s". Additional
arguments ensure that the output is at the required rate and number of
channels:

mpg123 -s -r 44100 --stereo lowfi.mp3 | artscat

Users of Ogg Vorbis may use the following:

ogg123 -d raw -f - music.ogg | artscat

Piping is not really necessary here since ogg123 has built-in
ESD and ARtS output drivers. Nevertheless, it is still useful to have
access to a raw stream of sample data which one can feed through a
pipeline.

Music files also can be obtained in MIDI format. If (like me) you
have an old sound card with poor sequencer hardware, you may find that
timidity can work wonders. Normally this package converts
MIDI files into sound samples for direct output to the sound device.
Carefully chosen command line options can redirect this output:

timidity -Or1sl -o - -s 44100 music.mid | artscat

The "-o -" sends sample data to standard
output, "-Or1sl" ensures that the samples
are 16-bit signed format, and "-s 44100"
sets the sample rate appropriately.

If you're a fan of the demo scene you might want to play a few music
modules on your desktop. Fortunately mikmod can play most of
the common module formats. The application can also output directly
to the sound device or via ESD. The current stable version of
libmikmod, 3.1.9, does not seem to be ARtS aware yet. One can
remedy this using a command pipeline:

mikmod -d stdout -q -f 44100 music.mod | artscat

The -q is needed to turn off the curses interface
which also uses standard output. If you still want access to this
interface you should try the following:

mikmod -d pipe,pipe=artscat -f 44100 music.mod

Only the later versions of mikmod know how to create their
own output pipelines.

Effects Filters

Let us return to the pipeline friendly sox. In addition to
its format conversion capabilities, there is small library of
effects filters. Here are some examples:

Hopefully these examples hint at what can be accomplished with the
pipeline technique. One cannot argue against using interactive
applications with elaborate graphical user interfaces. They often can
perform much more complicated tasks while saving the user from having
to memorise pages of argument flags. There will always be instances
where command pipelines are more suitable however. Converting a large
number of sound samples will require some form of scripting.
Interactive programs cannot be invoked as part of an at or
cron job.

Audio pipelines can also be used to save disk space. One need not
store a dozen copies of what is essentially the same sample with
different modifications applied. Instead, create a dozen scripts each
with a different pipeline of filters. These can be invoked when the
modified version of the sound sample is called for. The altered sound
is generated on demand.

I encourage you to experiment with the tools described in this
article. Try combining them together in increasingly elaborate
sequences. Most importantly, remember to have fun while
doing so.

Adrian J Chung

When not teaching undergraduate computing at the University of the West
Indies, Trinidad, Adrian is writing system level scripts to manage a network
of Linux boxes, and conducts experiments with interfacing various scripting
environments with home-brew computer graphics renderers and data visualization
libraries.