PROGRAMMING – BIG DATA – IT- ETC

Creating Digital Audio with ALSA

Sound in Linux is a complicated beast. There’s a stack of software (including things like Jack and Pulse Audio) that is responsible for receiving and mixing all the different types of audio streams and playing them through your speakers. On the bottom of this stack, interfacing your sound card, is ALSA.

ALSA stands for Advanced Linux Sound Architecture. It is a Linux device driver for interacting with your computer’s sound card, but this definition does not fully do it justice. ALSA has a raft of features and capabilities, including the way it provides virtualized soundcards for dealing with Linux’s legacy sound system, OSS. ALSA also provides a very rich development API that you can use to control your sound card.

Creating your own custom digital sounds, and using ALSA to play them, is the topic of this post.

To follow along, all you’ll need is some exposure to C programming, your own Installation of Linux and a little bit of patience. I’m not assuming any prior knowledge of digital audio.

Digital Sound Background

Sound is how we humans perceive a time varying wave of air pressure called sound waves. The amount the air pressure varies is the amplitude of the sound wave and is perceived as the volume of the sound we hear. The quickness with which the pressure varies is the frequency of the wave and is what we perceive as the sound’s pitch. Consider the open A string on a violin with a pitch colloquially known as A 440. It has this name because the wave oscillates 440 times per second, and so we say it has a frequency of 440 Hz. The figure below shows how the sound wave for this note would vary with time.

440 Hz sound wave

Mathematically, this wave can be represented by the equation below,

f(t) = sin(2 π 440 t)

A key observation here is how the wave varies smoothly with time. It’s as if the wave was infinitesimally divisible in both time and amplitude (signals processing folks would call this a continuous time signal.) This could also be referred to as an analog wave. This post, however, deals with digital audio, not analog, and the thing about digital audio is: you can’t get this infinitesimal division. The good news is that this limitation doesn’t prevent us from approximating the analog wave pretty well. We do this by sampling the wave.

Think of sampling like taking a collection of snap shots of the analog waves value at discrete, evenly spaced time periods. The time instance and the value of the wave at that time instance together are referred to as 1 sample. We can sample our analog wave many times each second to produce a digital approximation of the wave. The sequence of samples would be referred to by the signals processing community as a discrete time signal.

For reference, the Nyquist-Shannon sampling theorem says that you should always choose a sampling rate that is at least twice that of the highest frequency you expect to be able to reproduce. Humans can’t really hear pitches above approximately 20,000 Hz, or 20 kHz, so CD audio uses a sampling rate to 44.1 kHz. Since human’s don’t really talk in a pitch much higher than around 3 kHz, our voices are sampled at a rate of 8 kHz when we talk on the phone, or use walkie talkies.

Let’s say, for example, that I sampled the wave in the figure above 4400 times each second. This would give me 10 samples for each period. The frequency which with I sample my wave is called the sampling rate, and so in this case my sampling rate is 4400 Hz. The figure below shows the samples I would receive (in pink) from the sound wave (in blue).

Sound Wave, 440 Hz, sampled

Consider the mathematics behind sampling the wave:

Each sampling instance, n, is an instance of time, t, where t is only defined in multiples of the sampling period τ, which is the reciprocal of the sampling rate, fs.

τ = 1 / fs

n = T t, T = 0 τ, 1τ, 2 τ, …

If we substitute this back into our equation for the analog sound wave, we get this. Note, I’m using square brackets to denote a discrete time signal, versus the round brackets for a continuous time signal.

f[n] = f(T t) = sin(2 π f T t)

= sin(2 π f n)

If our sampling rate is 4400 Hz, then our τ = 1/4400. Also remember the frequency of our continuous time signal is 440 Hz. Substituting these values into our equation we get

f[n] = sin(2 π (440 / 4400) n)

= sin(2 π 1/10 n)

This is the equation for our discrete time signal. Note that it’s discrete time frequency is 1/10, which means it goes through 1 period of oscillation every 10 samples.

Now the trick becomes, ‘how do I create these samples?‘ and when I have them, ‘how do I play them?‘.

It turns out, creating the samples for a simple 440 Hz sine wave is pretty easy to do. Check out the program below:

Your sound card can have more than 1 channel. Most commonly, your sound card has two channels, left and right (for stereo sound). In ALSA, a frame is a set of samples (1 per each channel) for a single sampling instance. The way frames are stored within the buffer is configurable, but generally you will interleave them (alternate left sample, right sample, etc.)

ALSA transfers sets of frames to the hardware buffer in discrete length chunks called periods. The length of a period is configurable, both in time and number of sampling instances (these 2 quantities are related by your sampling rate).

Consider the example below. We’re going to use a signed 16 bit integer to represent our sample (just like CD audio). We have a left sample and a right sample and this creates a frame. A set of frames becomes a period and ALSA transfers the period of samples to the end of a hardware buffer.

Organization of samples to be passed

First ALSA Program

Here, I’ll try to create the Hello World of ALSA programs. The goal is to produce on second worth of a 440 Hz sound wave. I’ll show the source code below. I’ve tried to add lots of helpful comments to make it a little more obvious what is going on.

What’s Next?

Check our the ALSA API documentation for a more in depth look at the functions you’ve got available. Try creating more interesting sounds by changing the frequency of the wave as you produce samples (a technique called frequency modulation), or try implementing a Karplus-Strong filter as an easy way to get a half decent plucked string sound.