Realtime Audio Visualization in Python

Python’s “batteries included” nature makes it easy to interact with just about anything… except speakers and a microphone! As of this moment, there still are not standard libraries which which allow cross-platform interfacing with audio devices. There are some pretty convenient third-party modules, but I hope in the future a standard solution will be distributed with python. I appreciate the differences of Linux architectures such as ALSA and OSS, but toss in Windows and MacOS in the mix and it gets to be a huge mess. For Linux, would I even need anything fancy? I can run “cat file.wav > /dev/dsp” from a command prompt to play audio. There are some standard libraries for operating system specific sound (i.e., winsound), but I want something more versatile. The official audio wiki page on the subject lists a small collection of third-party platform-independent libraries. After excluding those which don’t support microphone access (the ultimate goal of all my poking around in this subject), I dove a little deeper into sounddevice and PyAudio. Both of these I installed with pip (i.e., pip install pyaudio)

I really like the structure and documentation of sounddevice, but I decided to keep developing with PyAudio for now. Sounddevice seemed to take more system resources than PyAudio (in my limited test conditions: Windows 10 with very fast and modern hardware, Python 3), and would audibly “glitch” music as it was being played every time it attached or detached from the microphone stream. I tried streaming, but after about an hour I couldn’t get clean live access to the microphone without glitching audio playback. Furthermore, every few times I ran this script it crashed my python kernel! I very rarely see this happening. iPython complained: “It seems the kernel died unexpectedly. Use ‘Restart kernel’ to continue using this console” and I eventually moved back to PyAudio. For a less “realtime” application, sounddevice might be a great solution. Here’s the minimal case sounddevice script I tested with (that crashed sometimes). If you have a better one to do live high-speed audio capture, let me know!

Here’s a simple demo to show how I get realtime microphone audio into numpy arrays using PyAudio. This isn’t really that special. It’s a good starting point though. Note that rather than have the user define a microphone source in the python script (I had a fancy menu system handling this for a while), I allow PyAudio to just look at the operating system’s default input device. This seems like a realistic expectation, and saves time as long as you don’t expect your user to be recording from two different devices at the same time. This script gets some audio from the microphone and shows the values in the console (ten times).

import pyaudio
import numpy as np
CHUNK = 4096 # number of data points to read at a time
RATE = 44100 # time resolution of the recording device (Hz)
p=pyaudio.PyAudio() # start the PyAudio class
stream=p.open(format=pyaudio.paInt16,channels=1,rate=RATE,input=True,
frames_per_buffer=CHUNK) #uses default input device
# create a numpy array holding a single read of audio data
for i in range(10): #to it a few times just to see
data = np.fromstring(stream.read(CHUNK),dtype=np.int16)
print(data)
# close the stream gracefully
stream.stop_stream()
stream.close()
p.terminate()

I tried to push the limit a little bit and see how much useful data I could get from this console window. It turns out that it’s pretty responsive! Here’s a slight modification of the code, made to turn the console window into an impromptu VU meter.

The results are pretty good! The advantage here is that no libraries are required except PyAudio. For people interested in doing simple math (peak detection, frequency detection, etc.) this is a perfect starting point. Here’s a quick cellphone video:

I’ve made realtime audio visualization (realtime FFT) scripts with Python before, but 80% of that code was creating a GUI. I want to see data in real time while I’m developing this code, but I really don’t want to mess with GUI programming. I then had a crazy idea. Everyone has a web browser, which is a pretty good GUI… with a Python script to analyze audio and save graphs (a lot of them, quickly) and some JavaScript running in a browser to keep refreshing those graphs, I could get an idea of what the audio stream is doing in something kind of like real time. It was intended to be a hack, but I never expected it to work so well! Check this out…

Here’s the python script to listen to the microphone and generate graphs:

Here’s the result! I couldn’t believe my eyes.It’s not elegant, but it’s kind of functional!

Why stop there? I went ahead and wrote a microphone listening and processing class which makes this stuff easier. My ultimate goal hasn’t been revealed yet, but I’m sure it’ll be clear in a few weeks. Let’s just say there’s a lot of use in me visualizing streams of continuous data. Anyway, this class is the truly terrible attempt at a word pun by merging the words “SWH”, “ear”, and “Hear”, into the official title “SWHear” which seems to be unique on Google. This class is minimal case, but can be easily modified to implement threaded recording (which won’t cause the rest of the functions to hang) as well as mathematical manipulation of data, such as FFT. With the same HTML file as used above, here’s the new python script and some video of the output:

I don’t really intend anyone to actually do this, but it’s a cool alternative to recording a small portion of audio, plotting it in a pop-up matplotlib window, and waiting for the user to close it to record a new fraction. I had a lot more text in here demonstrating real-time FFT, but I’d rather consolidate everything FFT related into a single post. For now, I’m happy pursuing microphone-related python projects with PyAudio.

UPDATE: Displaying a single frequency

Use Numpy’s FFT() and FFTFREQ() to turn the linear data into frequency. Set that target and grab the FFT value corresponding to that frequency. I haven’t tested this to be sure it’s working, but it should at least be close…

UPDATE: Display peak frequency

If your goal is to determine which frequency is producing the loudest tone, use this function. I also added a few lines to graph the output in case you want to observe how it operates. I recommend testing this script with a tone generator, or a YouTube video containing tones of a range of frequencies like this one.

About Scott

Scott Harden lives in Gainesville, Florida and works at the University of Florida as a biological research scientist studying cellular neurophysiology. Scott has lifelong passion for computer programming and electrical engineering, and in his spare time enjoys building small electrical devices and writing cross-platform open-source software. more →