Powerful Multimedia Command-Line Tools, Part I - SoX

SoX is a power-packed command-line tool for various types of audio
processing.
It's very useful as an audio format converter, and it can be used for
resampling audio files, converting between endianness, audio
encoding and modifying other attributes of common audio file formats.

Its main power, however, is its effect plugins. It can apply various
effects to audio in the same way a digital audio workstation does.
You can add echoes, filter frequencies, reduce or increase volume, remove
noise and do various other advanced digital signal processing on sound
samples.

Its companion program, play, can be used to test what a particular audio
effect does before copying the output to a file with SoX.
But, play does not understand MP3 and Ogg Vorbis files.
You have to use one of the supported formats—the best bet is the
uncompressed wav PCM format. It also supports audio mixing with its companion tool, soxmix.

Very sophisticated filtering and resampling algorithms make it a useful
tool in its own right for audio manipulation. However, some of the advanced
features of a professional digital audio workstation are missing.

The graphical audio processing tool Audacity is a user-friendly
tool that has several of the same effects that SoX has. But, because it's a
command-line tool, SoX lends itself to easy scripting, which makes it invaluable when working with hundreds or thousands of sound files.

Producing audio effects is difficult, because it is as much an art as it is a science. You often have to tweak the input values until satisfied with the result. And, you have to use different values for different files, because their frequency spectra differ based on whether the sound file contains high-fidelity music, speech or silence and also whether it is classical music or rock music, and so on.

You also can create a 5.1 channel audio file from matrix-encoded source
using a combination of SoX and another companion program called multimux, written by Panteltje.

SoX can be used for recording FM radio or audio from television
using the v4l2 driver in Linux, or it can record sound directly from the
/dev/dsp sound card input using the ossdsp SoX input.
Be aware that sound cards have limitations on sampling rates. You
can't expect your sound card to be able to play audio at any sampling
rate.

For downmixing from stereo to mono, combining multiple audio tracks
and removing silence at the beginning of audio tracks, this is the
application you need.

Figure 1. SoX, the Command-Line DAW

SoX Effects

First, let's look at the interesting echo effect:

$ play foo.wav echo 0.7 0.6 50 0.2

You will need to play with different values for the gain parameters and delay
and decay values.
Most effects take time values as input in seconds. The man page is not
very clear about the ranges of values and other finer details, but it
should not be too difficult to figure out what values work.

Also note that the echo effect can be distracting in certain
circumstances. Although, I have found it adds a certain degree of liveliness to
some speeches.

There is also an echos effect. It functions similarly but is more
complex:

$ play foo.wav echos 0.4 0.6 900.0 0.25 900.0 0.3

You also can specify a large delay to the echo effect to make it sound
eerie:

$ play foo.wav echo 0.7 0.89 1000.0 0.1

Try this with different values (in place of 1000) for the delay,
until you arrive a value that you like.

Songs often have some silence in the beginning, which
can be a distraction on playlists. Silence is fine for a couple
seconds, but for more than that, it becomes annoying. You can delete
periods of silence in your music collection with SoX using the
trim effect plugin.

If you don't have any wav files and if all your music
is in MP3, Ogg, aac or ac3 formats, don't
despair; FFmpeg can fix this for you:

$ ffmpeg -i foo.mp3 foo.wav

You can convert it back after SoX processing using the same
command but reversing the arguments.

Doing the following:

$ sox foo.wav trim 0 10 trimmed.wav

removes the first ten seconds of audio in foo.wav. You can figure
out what value to use instead of ten by observing the time counter in
XMMS or whatever player you use to listen to music.

SoX can do better of course. It can figure out the amount silence for you
by using the silence effect plugin. Check out the man page for
details. You also can specify the threshold of what you consider silence,
because noise levels interfere with silence processing.

Speaking of noise, you can filter noise patterns that have a fixed
spectra easily with SoX. Typically, noise in audio files
comes from static sources and is not too hard to remove. Well, that's not
always the case, but once you figure out how to remove noise from one input
file, and if all input files were recorded from the same source,
you can bank on using that strategy for the other files as well.

Other types of
noise removal, however, are not easy at all. It often requires several
experiments, and most of the time it backfires and removes the signal along
with the noise.

In such a situation, you would be better off using high-fidelity recording
equipment.
As far as dealing with ambient noise, again that depends—if it's
someone talking, it's difficult; if it's a constant hum, it's not. Doing:

$ sox foo.wav -t nul /dev/null trim 0 0.5 noiseprof profile

will derive a noise profile from the periods of silence at the
beginning half second of the input file.
Later, you can see whether this removed the noise:

$ play foo.wav noisered profile

However, that didn't work for me the first time.
I was remarkably successful once—I could convert a very
noisy DAT tape recording into crystal clear audio with ease. But,
other times I had trouble.
You will need to do some tweaking and a lot of experimentation.

Let's move on to the chorus effect. A typical chorus has
many voices (both human and instrumental) that are slightly out
of phase.
The phase usually remains constant, as the singers try
to perform with a fixed lag. They may attempt to correct this, but for the
most part, chorus singers don't
sing in perfect unison.
The chorus effect reproduces this beautifully. Try it with the following:

Note the -s and -t arguments; they are used to specify sinusoidal and triangular patterns for the
filters.

SoX makes good use of mathematics for its DSP work, and you can
specify which primitive to use for a particular effect.
You also can set up a SoX pipeline by using the - output filename.
And, you can specify multiple effects on the command line. For example:

$ play foo.wav fade 5

will fade in for five seconds while slowly increasing the volume. You can
do a fade out with the same effect.

The following command will let you hear devil music (play a
song backward):

$ play foo.wav reverse

SoX has
several highly advanced resampling algorithms, and there
are several effects I have not covered in this article, so you should spend
some time exploring SoX for yourself.
On its own, SoX is very powerful, and if you use it in concert with
other tools, command-line or graphical, it provides even more power.
Its ability to accept input from standard input and spit out
the processed file to standard output comes in handy for setting
up an audio processing pipeline.

Girish Venkatachalam is an open-source hacker deeply interested in
UNIX. In his free time, he likes to cook vegetarian dishes and actually
eat them. He can be contacted at girish1729@gmail.com.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.