First Steps

Next, if you’re new to audio programming, you might want to read up on some of the basics of digital audio first. Check out this blog post for an introduction.

Basic Concepts

The WaveFile gem lets you both read and write wave files. Reading is done using the Reader class, and writing is done using the Writer class.

The Buffer class represents a collection of samples in a given sample format (e.g. stereo 16-bit PCM samples at a 44,100Hz sample rate). When samples are read using Reader they are returned in Buffer instances. Samples to be written are given to Writer wrapped in Buffer instances as well.

A Buffer consists of two parts: an array of samples, and a Format instance that describes the sample format (since it might not be possible to determine just by looking at the raw samples). For example, the sample array in a Buffer read out of a mono 8-bit PCM file (in which each sample is an integer between 0 and 255) might look like this:

[45, 192, 13, 231, 201, 101, 15, ...etc...]

When there is more than one channel, each sample frame will be represent by a sub array. For example, a set of stereo floating point samples (in which each sample is between -1.0 and 1.0) might look like this:

When writing a program that creates sound, you would generate an array like this with the sample data, then wrap it in a Buffer, and then use Writer to write the samples in the Buffer to disk.

Buffers have the ability to convert their samples to any other format this gem supports. This means you can read samples from a file in whatever format you like, regardless of the actual sample format in the file (e.g. read a file with 8-bit samples and get 16-bit samples back). You can also do the same with Writer – for example, rather than remember the sample range of a PCM format (was it 32,767? or 32,768?) you can just generate floating point samples between -1.0 and 1.0, and transparently write them out as PCM samples.

Creating a New Wave File

Let’s write a simple tone to a wave file. A square wave is about the simplest way to create a sound, so let’s do that. A square wave consists a some repeated samples, followed by the same number of repeated samples at the opposite amplitude. For example:

[0.3, 0.3, 0.3, 0.3, -0.3, -0.3, -0.3, -0.3, ...and repeated...]

We’ll write some code to generate these samples, wrap them in a Buffer, and then write these Buffers to a file using Writer.

The samples we’ll generate will be in float format, which means they should be between -1.0 and 1.0. The larger each sample value, the higher the amplitude (i.e. how loud it is). The faster that we alternate between the positive and negative samples, the higher the frequency of the tone (i.e., [0.2, 0.2, -0.2, -0.2] will have a higher pitch than [0.2, 0.2, 0.2, -0.2, -0.2, -0.2]). (This is a really simplified explanation of things).

50 positive samples followed by 50 negative samples will produce the same pitch as middle A on a piano when the sample rate in 44,100Hz. Let’s use that and generate our sample array:

Notice that we used the Format class to identify the sample format. The Format constructor takes 3 arguments: the number of channels, the format of each sample, and the sample rate. You’re on the honor system to use the correct format here, weird stuff could happen if you use the wrong format.

Now let’s write the buffer to a file called "square.wav" in the current working directory:

Notice that we gave the Writer a Format as well. This determines which format samples will be written as. Notice that the sample format (:pcm_16) is different from the Buffer we created – the gem will handle the necessary translation behind the scenes.

All of the code to write the samples is done inside a block. When the block exits the file will automatically be closed. (If you want more manually control over when the file is closed you can do that as well by not passing a block and manually calling close()).

When you run this program it should create a file called "square.wav" in the current working directory. If you play this file (for example on a Mac using afplay square.wav from the command line) it should sound like this:

...which... doesn’t sound like anything! The reason is that we didn’t generate enough samples. At the sample rate we’re using, 44,100Hz, you’ll need 44,100 samples for 1 second of sound. We only generated 100 samples, or about 1/441th of a second. No problem, we can easily fix this by repeating our cycle more times:

Reading a Wave File

Let’s now read the file we just wrote. We can use the Reader class for that.

require 'wavefile'
include WaveFile # So we don't have to prefix all classes with 'WaveFile::'
Reader.new("square.wav").each_buffer do |buffer|
puts "Buffer number of channels: #{buffer.channels}"
puts "Buffer bits per sample: #{buffer.bits_per_sample}"
puts "Number of samples in buffer: #{buffer.samples.length}"
puts "First 10 samples in buffer: #{buffer.samples[0...10].inspect}"
puts "--------------------------------------------------------------"
end

After constructing the Reader we call the each_buffer method. This method is useful when you want to read an entire file. It reads successive buffers of a given size, and passes each to the given block. When all buffers have been read, the file is automatically closed. (You can also manually control what to read and when to close the file, see the examples page for more info).

When you run the program, it should print out repeated output similar to the following:

Notice how these samples are integers, rather than the floats that we generated in the last section. This is because when we saved that file, we indicated (via the Format instance we gave) that the samples should be written as 16-bit PCM samples instead of floating point.

Also notice that we only read 4,096 samples at a time, instead of trying to the read the whole file. It’s a generally a good idea to read a larger number of smaller buffers, rather than one giant buffer. For this file it probably doesn’t matter, but longer files can have millions of samples, and Ruby can have trouble with arrays this large.

OK, well that’s cool, but let’s say we want to read this file so we can do some transformation on it, and it will be easier to work with if the samples are in floating point format, and are stereo (since we want to combine it with some other files that are stereo). No problem, when constructing the Reader we just need to pass a Format instance that describes our desired sample format.