Audio in the Browser: Horrors and Joys – Learning

At NPR, we create some of the finest audio storytelling in the world every day. When our radio reporters go into the field to gather stories, they often have a team of audio producers gathering interview clips and natural, found sounds of the highest quality.

On the web, audio has never enjoyed the wide browser support and investment that other forms of media, such as images and videos, have received. This means working with audio as a web developer can be frustrating and limited. Over the past few months, NPR Visuals has settled on some patterns for working with audio.

Audio on the Web: A Brief, Frustrating History

The first web browser to implement audio was—shockingly—Internet Explorer. Microsoft implemented the <bgsound> tag in Internet Explorer 4.0 as a way for a web page to autoplay an audio file when a page loaded. This is also known as the worst possible way to use audio on the web, or That MIDI Version Of “Stairway To Heaven” On Every Geocities Page.

So fraught was the tag, Netscape and other browsers of the time didn’t even attempt to implement it. But Flash quickly followed, and for a long, dark period of the web, the only way to reliably use audio (I’m told—I was like, 12) was to embed it in a Flash player.

With the advent of HTML5 came a new, native option for audio in the browser: the humble <audio> tag. It seemed so simple, so elegant. A semantic way to include an audio file, just like you might include an image. It even got around the pesky audio codec issue by allowing you to supply multiple source files for each audio tag.

But for years, browser support for the HTML5 audio lagged behind, and determining which codecs you needed to provide for your audio files seemed an impossible task. Cutting MP3s, OGGs, and WAVs of every audio file you wanted to use made the editing process much more tedious.

Finally, in 2015, HTML5 audio-based solutions are stable and simple enough. You can give an <audio> tag an MP3 file and expect it to play in every modern browser, including IE9 and up.

But HTML5 audio has many limitations. HTML5 audio is like a stereo—you can put audio in it, and you can stop, skip around, pause and play those sounds, but you have no way of getting at what is really in that sound. You can’t access the data behind the sound, such as frequency or volume data.

The newest audio technology, the Web Audio API, gets around all of those limitations. But it is not quite ready for prime time, as we will get into later.

Working with audio can be frustrating without the right infrastructure set up. Below, I will focus largely on HTML5 audio and walk through how NPR develops with audio, including solutions to some of the problems we have run into while developing.

HTML5 Audio: What Is Up

I know nothing about HTML5 audio. Can you give me a quick primer?

Sure! The proposal for HTML5 audio arrived when the W3C (World Wide Web Consortium) first proposed HTML5 as a standard in 2008. The proposal called for an <audio> tag that would play various types of audio files natively in the browser.

Various browser vendors took a long time to implement this, and for years, they supported different audio codecs (MP3, OGG, WAV, etc.). But today, all modern browsers support the <audio> tag and the MP3 audio codec. Thus, a basic audio player might look like this:

Example

In both examples, we use the controls attribute to use the browser default UI for controlling audio. These controls look very different in every browser, so make sure you are prepared for that if you want to use the default controls.

If you don’t want to use those controls and don’t want the audio to autoplay (you don’t), you will need to implement controls in JavaScript. There are libraries for this.

But JavaScript Tho

Every JavaScript library I have tried with audio seems not to work. What am I doing wrong?

You’re not doing anything wrong! A lot of JavaScript libraries have attempted to provide a clean API for working with audio in JavaScript. Many of them have not kept up with browser support, and many of them are simply bad JavaScript libraries. We use the tried-and-true jPlayer. On just about every audio project, we try a different JavaScript library to see if we find one that works as well, is lighter, and is something we like better. We haven’t found one yet.

OK Fine Tell Me About jPlayer

Okay, so how do I use jPlayer?

jPlayer is an unfortunately heavy dependency, as it also provides support for video and a whole skinning interface that is not as useful for our purposes, but its audio playback has proved fairly bulletproof for us. jPlayer also depends on jQuery.

To get started with jPlayer, read its documentation and take a look at some of our code. A basic player works like this.

Getting an audio player working is fairly simple. First, you need an HTML element to attach the jPlayer instance to, in the DOM.

<div id="audio-player"></div>

Next, in your JavaScript, initiate the jPlayer instance when the document is ready.

We pass a path to a folder containing the jPlayer SWF players for jPlayer’s Flash fallback. The browsers we support don’t need it, but it can’t hurt to be backwards-compatible if necessary. jPlayer defaults to HTML5 and will use the Flash player if the browser does not support HTML5.

jPlayer has a host of other options you can pass to the init function. Read about those in the jPlayer documentation.

Note that we haven’t passed it the file we want to play yet. We’ll do that when we activate the player on a click event. To start the audio, bind a click event to something the user will interact with. For us, this is usually a begin button.

Example

Note that this function also plays the audio file on click. That may not be the behavior you want; if so, simply remove the .jPlayer('play') from that function.

I want to play multiple audio files in my application. Do I need a jPlayer instance for every one?

It depends. You only need as many jPlayer instances as audio files you can have playing simultaneously. Usually, that is only one. So for Songs We Love, despite having more than 300 songs in the app, we only had one jPlayer instance handling all of them. For Life After Death, we could have both ambient and narrative audio files playing simultaneously, so we needed two instances of jPlayer.

When you are ready to change the file, use the setMedia function in jPlayer. For example:

I’m trying to autoplay audio on my page, and it won’t play on mobile! What can I do to force the audio file to play?

My advice: Don’t autoplay an audio file. There are two reasons for this, one technical reason and one UX-based reason.

The technical reason is that mobile devices require a touch event to activate an audio file. Some have tried to offer a workaround for this by creating an empty buffer in your audio instance and simulating a touch event to activate the audio player. Don’t do this–it sometimes doesn’t work, and at worst, it hard crashes the entire device.

The UX-based reason is that you simply shouldn’t autoplay an audio file. Your user could be in a place where audio would not be appropriate, or she may want to hear the audio through speakers or headphones but has not plugged them in. At NPR, we warn users on our titlecards that audio is a part of this experience with a prompt that says, “Put on your headphones.”

What you should do is: bind the first play event on your jPlayer instance to a click event. From there, your jPlayer instance will always be active, and you can do whatever you need to do.

Localhell

I’m trying to develop with audio files, and they are not playing correctly locally but work just fine on staging/production. What’s the difference?

Never use an audio file locally if you can get around it.

While web browsers all know how to play MP3s now, they all do it in slightly different ways and expect slightly different browser headers in order to interpret an audio file correctly. We have tried implementing those headers in our local Gunicorn development server and Flask app, but we never quite got it right. The problems you will usually see have to do with the browser’s inability to determine the length of the file or understand that it has to continue downloading content. Symptoms include progress bars not working and playback cutting off prematurely.

However, Amazon S3 does get the browser headers right. For all of our projects, we host our audio files on a private bucket for development, and then we deploy the audio files to our staging and production buckets when we are ready to test and launch.

The Magic of timeupdate

I want to have an event fire in my code at a certain point in my audio file. Should I use Popcorn.js?

You could. However, if you are using jPlayer, you don’t really need it.

One of the things HTML5 audio provides is a timeupdate event, which jPlayer listens to and which allows you to fire a callback function when the event fires. The timeupdate event fires about 10 times per second, which gives you pretty granular control over the exact position of your audio. With that information, you can do a variety of things.

The obvious use case is to provide a timer of how long the audio has been playing. jPlayer makes that very simple.

First, you need a DOM element to input the timer into:

<span class="timer"></span>

Then, you bind a callback function to the timeupdate event when you initialize the player.

When you play the audio file, this should update the <span> tag as the audio file plays.

Example

However, the timeupdate event can be used for so much more, creating more complex scenarios of the kind that Popcorn.js tends to solve. Here is an example from A Brother And Sister In Love, an audio-driven narrative that also has sequenced visuals to accompany the story. Much like our user-driven sequential visual stories, A Brother And Sister In Love is built out of a spreadsheet, one row per slide. You can read more about that process here.

For this story, we added a column to our spreadsheet with the time in the audio file that this slide should exit the screen, called slide_end_time. When we build the slides in HTML from the spreadsheet, the slide_end_time is stored as a data attribute. For example:

Then, we define the onTimeupdate function, which listens to the position of the audio at the time the event fires and compares it with the current slide’s stored end time. Note that currentIndex, a global variable set in a separate function when the slide changes, is the slide number we are currently on.

var onTimeupdate = function(e) {
var position = e.jPlayer.status.currentTime;
// loop through all of the slides where $slides is an array of jQuery objects of each slide
for (var i = 0; i < $slides.length; i++) {
var endTime = $slides.eq(i).data('slide-end-time');
// if the position is less than the end time of the slide of this loop
if (position < endTime) {
// if the slide we're on has an endTime beyond our position, do nothing
if (i === currentIndex) {
break;
}
// once we've managed to loop past the current slide, move to that slide
else {
moveSlide(i);
break;
}
}
}
}

The Web Audio API

I want to visualize my audio with a waveform or spectrogram in real-time!

Hahaha, you need the Web Audio API. (See below.)

I want users to be able to interact with my audio and actually change how it sounds!

Hahaha, you need the Web Audio API. (See below.)

I want to control the volume of my audio based on events in my JavaScript, even on mobile!

Hahaha, you need the Web Audio API. (See below.) (Note: HTML5 audio cannot control volume on mobile, as mobile devices will not allow you access to system volume.)

Why are you laughing at me when I need the Web Audio API?

I just think it’s funny that we have all these great ideas about what we can do with audio, and we can’t do them yet. But there is hope: The Web Audio API promises to completely transform how we work with audio in the browser in ways that go well beyond HTML5. Unfortunately, it is still “the future” because, even though Chrome, Firefox, and Safari all support it, it is not currently supported in any version of Internet Explorer, and performance on mobile is still unpredictable.

Think of the Web Audio API less like a stereo and more like a recording studio. Rather than putting an audio file in the stereo and pressing play, you take the audio file and make it just one input on your mixing board–just a signal of audio data. Then, on your mixing board, you can change the sensitivity of the signal, alter how the signal is processed, and analyze the signal. You can add other inputs, and not just from audio files. You can use a device’s microphone or line-in inputs as well.

It is still early for the Web Audio API, but the benefits of using it are already clear to me, including:

The ability to control volume on mobile. Because mobile devices will not allow you to control the system audio, you cannot control the volume of HTML5 audio. With the Web Audio API, you control the gain, or sensitivity, of the input, rather than the volume of the device. Thus, you can turn down the gain of your audio file, making it sound quieter on any device.

The ability to visualize audio. From spectrograms to waveform displays, visualizing audio makes many intricate things about audio understandable.

The ability to have audio respond to user interaction. The possibilities here are pretty endless, but take a look at this virtual guitar pedalboard for an example.

You may have failed with the Web Audio API, but I am a better developer than you! I’ll make it work!

This is probably true.

The best library I have seen for working with Web Audio is howler.js. It is built by a game developer for game developers, so its architecture is a little funny for narrative storytelling, and it does a lot of strange things to get around mobile’s autoplay restrictions.

That being said, it works really well on desktop browsers that support Web Audio, and even most mobile browsers! However, my apps using Howler would crash the entire device every time I opened it in an iOS in-app browser or anything that wasn’t iOS Safari. That means iOS Chrome, or opening a link in Facebook or Twitter. That accounts for at least 20 percent of our traffic to a given app, so this is an unacceptable bug.

I haven’t worked out whether the failure is due to Web Audio or Howler.

You expected me to read 2,000 words about audio in the browser? I just scrolled down for the kicker. What’s the synopsis?

Years after it was promised, HTML5 audio is a stable and working thing, for the most part. You can give an MP3 file to an <audio> tag and expect it to play in all modern browsers.

Use jPlayer if you want a JavaScript library; it is the most tried-and-true library out there. It even provides a nice Flash fallback if you need to support super-old browsers like IE8.

If you can, host your audio files somewhere (I recommend Amazon S3) even when you are developing locally. This will get around a lot of headaches regarding browser headers and range requests.

Don’t autoplay on page load. Ever.

The timeupdate event native to HTML5 audio is really powerful and can be used to fire events in your JavaScript, based on the position of the audio file using jPlayer. See the jPlayer docs.

The Web Audio API, which basically puts a recording studio in the browser, is almost here!