Essential Audio and Video Events for HTML5

The <video> and <audio> elements provide a comprehensive range of events. While some are quite straightforward, like the self-explanatory "play" event, others can be rather more tricky to understand, especially the "progress" event.

So let’s examine some of the most important media events, looking at when and how they fire and what properties are relevant to them. We’ll also try to navigate the quirks of their behavior in current browsers (well, you didn’t think they’d all be the same, did you?).

(For reference testing, I’ll be using the latest public versions of the most common browsers — Opera 12, Chrome 28, IE10, Firefox 22, Safari 5 (desktop) and Mobile Safari 6 (iOS). So wherever a browser is referred to only by name (e.g. Opera) it means this latest version.)

Playback Events

--ADVERTISEMENT--

The playback events are those which fire in response to playing or pausing the media. These events are quite straightforward.

The "play" and "pause" events fire when the media is played or paused (respectively), but there’s also an "ended" event which fires when the media reaches the end — either because ordinary playback has finished, or because the user manually “seeked” that far.

There are media functions that correspond with the first two events — unsurprisingly called play() and pause(). There are also two media properties that correspond with the last two events — the .paused property is true by default, or whenever the media is paused, while the .ended property is false by default, but then becomes true when playback reaches the end (i.e. at the same time as the "ended" event fires).

However there is a significant anomaly here in Opera, Safari and IE10, which is that the .paused flag remainsfalse when the media has ended (yet logically, it should be true since the media is no longer playing). A practical upshot of this is that a simple play/pause button handler like this would fail in that situation (i.e. the button would do nothing at all):

Firefox and Chrome already fix this internally, and in exactly the same way — by firing a "pause" event just before the "ended" event.

Loading Events

The loading events are those which fire in respect of loading (or failing to load) media data. The prevalence of these events depends on the loading state of the media, i.e. whether the preload attribute is used and/or whether the media is already cached.

The first to fire in all cases is the "loadstart" event, which means that the browser has begun to look for data. But that’s all it means — it doesn’t mean any data has actually loaded, or that the media resource even exists.

If the preload attribute has the value "none", then the "loadstart" event is the only one that will fire before playback begins. Whereas if the preload attribute has the value "metadata" or "auto", then two more events will fire quite soon, which are "progress" and "loadedmetadata". (Without preloading these events will still fire, but not until playback begins.)

The "progress" event is rather complex, so we’ll look at that separately in the next section, but the "loadedmetadata" event is straightforward, as it simply means that the browser has loaded enough meta-data to know the media’s .duration (as a floating-point number, rather than its default value of NaN).

Of course the "loadedmetadata" event will only fire at all if the media is able to load — if it fails (for example, if the src returns a 404), then the media will instead produce an "error" event, and no further playback will be possible.

Here again we encounter some important browser variations. In Mobile Safari the preload settings are intentionally not implemented, so all values for that attribute behave the same as if it were "none". In IE10 by contrast, the media meta-data is always loaded by default, so a preload value of "none" behaves the same as if it were "metadata".

After "loadedmetadata" has fired, the next significant event is "canplay", which the browser will fire to indicate when enough data has loaded for it to know that playback will work (i.e. that it can play). If preload is "auto" then the "canplay" event will fire after a couple of seconds of data has loaded; if preload is "metadata" or "none" it won’t fire until playback has begun. The one exception to this rule is Chrome, which always fires "canplay" during initial preload, even if it’s only meta-data.

There’s also a secondary event called "canplaythrough", which the browser should fire when it estimates that enough media data has loaded for playback to be uninterrupted. This is supposed to be based on an estimation of your connection speed, and so it shouldn’t fire until at least a few seconds’ of data has been preloaded.

However in practise, the "canplaythrough" event is basically useless — because Safari doesn’t fire it at all, while Opera and Chrome fire it immediately after the "canplay" event, even when it’s yet to preload so much as a quarter of a second! Only Firefox and IE10 appear to implement this event correctly.

But you don’t really need this event anyway, since you can monitor the "progress" event to determine how much data has been preloaded (and if need be, calculate the download speed yourself):

The Progress Event

The "progress" event fires continually while (and only while) data is being downloaded. So when preload is set to "none", it doesn’t fire at all until playback has begun; with preload set to "metadata" it will fire for the first few seconds, then stop until playback begins; with preload set to "auto" it will continue to fire until the entire media file has been downloaded.

But for all preload settings, once playback has begun, the browser will proceed to download the entire media file, firing continual "progress" events until there’s nothing left to load, which continues in the background even if the video is subsequently paused.

The data itself is represented by a set of time-ranges (i.e. discreet portions of time), and it’s crucial to understand how these work before we can make use of the "progress" events.

When the media first starts to load, it will create a single time-range representing the initial portion. So for example, once the first 10 seconds’ of data has been loaded, the time-range could be represented as an array of start and end times:

[0,10]

However it’s possible (in fact very likely) for multiple time-ranges to be created. For example, if the user manually seeks to a time beyond what’s already been preloaded, the browser will abandon its current time-range, and create a new one which starts at that point (rather than having to load everything in between, as basic Flash players do).

So let’s say the user jumps forward two minutes and continues playback from there, then once another 10 seconds have preloaded, we’d have two ranges, which we could represent like this:

[
[0,10],
[120,130]
]

If the user were then to jump back again, to a time mid-way between the two ranges, then another (third) range would be created:

[
[0,10],
[60,70],
[120,130]
]

Then once the end of that range reached the starting point of the final one, the ranges would be merged together:

[
[0,10],
[60,130]
]

The arrays in those examples are just representations, to help explain the concept — they’re not how time-range data actually appears; to get the data in that format we must compile it manually.

The media has a .buffered object that represents the time-ranges. The .buffered object has a .length property to denote how many ranges there are, and a pair of methods called start() and end() for retrieving the timing of an individual range.

So to convert the buffered data into those two-dimensional arrays, we can compile it like this:

Ultimately, we can use that data to create something more user-friendly — like a visual progress-meter, as the following demo shows. It’s simply a bunch of positioned <span> inside a containing <div> (we can’t use the <progress> element because it doesn’t support multiple ranges):

There are a few notable browser quirks with "progress" events and buffered data. The first is a difference in the .buffered data when loading from the start — whereas most browsers create a single time-range (as described at the start of this section), Opera will create two ranges, with the first being as expected, and the second being a tiny fragment of time right at the end (roughly the last 200ms). So if the media were two minutes long and the first 10 seconds had loaded, the ranges would be something like this:

[
[0,10],
[119.8,120]
]

Another caveat is that Mobile Safari doesn’t retain the data for multiple ranges — it discards all but the active range (i.e. the range that encompasses the current playback position). This is clearly intentional behavior, designed to minimize the overall amount of memory that media elements consume. So to use the earlier example again, where the user jumps forward two minutes, the resulting buffered data would still only contain a single range:

[
[120,130]
]

Both of these quirks are worth knowing about, but they won’t usually make much difference, as far as development is concerned. However another, far more significant quirk, is the behavior of browsers in cases where the entire media file has already been preloaded. In this case, most browsers will fire a single "progress" event, containing a single time-range that represents the entire duration. However Opera and IE10 don’t provide this progress data — Opera fires a single event in which the buffer has no ranges (i.e. .buffered.length is zero), while IE10 doesn’t fire any "progress" events at all.

In the case of the visual progress-meter, this would mean that the meter stays empty, instead of being filled. But it’s simple to fix nonetheless, using an additional "loadedmetadata" event — because once that event fires in these browsers, the .buffered data does now represent the full media duration.

Timing Events

The last thing we’ll look at briefly is the media "timeupdate" event, which fires continually while the media is playing. You would use this event to synchronize other things with media playback, such as creating manual captions, highlighting the active line in a transcript, or even for synchronising multiple media sources — something I looked at in an earlier article: Accessible Audio Descriptions for HTML5 Video.

The frequency at which the "timeupdate" event fires is not specified, and in practise it varies widely among different browsers. But as an overall average it amounts to 3–5 times per second, which is accurate enough for most synchronisation purposes.

As far as I know, there are no browser bugs or quirks with this event. Makes a nice change, hey!

Afterword

This article doesn’t include every possible media event — there are other playback and seeking events, events for advanced network states, and even one which fires when the volume changes. But I’ve covered what I think are the most important — enough for most of the simple scripting you might want to do with video and audio, and enough to build a basic custom interface.

Here’s a final reference demo to help you get a feel for these media events. It creates a dynamic log of the playback and progress events we’ve discussed, showing timings and related property data to accompany each event:

James is a freelance web developer based in the UK, specialising in JavaScript application development and building accessible websites. With more than a decade's professional experience, he is a published author, a frequent blogger and speaker, and an outspoken advocate of standards-based development.