Eaten by the Python.

Read and Write Audio Files in Python Using FFMPEG

This article shows how easy it is to read or write audio files in a few lines Python, by calling the external software FFMPEG through pipes. If you want a battle-tested and more sophisticated version, check out my module MoviePy. Check also that other article for the same with video files.

Before we start, you must have FFMPEG installed on your computer and you must know the name (or path) of the FFMPEG binary on your computer. It should be one of the following:

12

FFMPEG_BIN="ffmpeg"# on LinuxFFMPEG_BIN="ffmpeg.exe"# on Windows

Reading

To read the audio file “mySong.mp3” we first ask FFMPEG to open this file and to direct its output to Python:

12345678910

importsubprocessasspcommand=[FFMPEG_BIN,'-i','mySong.mp3','-f','s16le','-acodec','pcm_s16le','-ar','44100',# ouput will have 44100 Hz'-ac','2',# stereo (set to '1' for mono)'-']pipe=sp.Popen(command,stdout=sp.PIPE,bufsize=10**8)

In the code above -i mySong.mp3 indicates the input file, while s16le/pcm_s16le asks for a raw 16-bit sound output. The - at the end tells FFMPEG that it is being used with a pipe by another program. In sp.Popen, the bufsize parameter must be bigger than the biggest chunk of data that you will want to read (see below). It can be omitted most of the time in Python 2 but not in Python 3 where its default value is pretty small.

Now you just have to read the output of FFMPEG. In our case we have two channels (stereo sound) so one frame of out output will be represented by a pair of integers, each coded on 16 bits (2 bytes). Therefore one frame will be 4-bytes long. To read 88200 audio frames (2 seconds of sound in our case) we will write:

1234567

raw_audio=pipe.proc.stdout.read(88200*4)# Reorganize raw_audio as a Numpy array with two-columns (1 per channel)importnumpyaudio_array=numpy.fromstring(raw_audio,dtype="int16")audio_array=audio_array.reshape((len(audio_array)/2,2))

The codec can be any valid FFMPEG audio codec. For some codecs providing the output bitrate is optional. Now you just have to write raw audio data into the file. For instance, if your sound is represented have a Nx2 Numpy array of integers, you will just write

1

audio_array.astype("int16").tofile(self.proc.stdin)

Going further

I tried to keep the code as simple as possible here. With a few more lines you can make useful classes to manipulate video files, like FFMPEG_AudioReader and FFMPEG_AudioWriter that I wrote for my video editing software. In these files in particular how to parse the information on the video, how to save/load pictures using FFMPEG, etc.