Building Interactive HTML5 Videos

The HTML5 <video> element makes embedding videos into your site as easy as embedding images. And since all major browsers support <video> since 2011, it’s also the most reliable way to get your moving pictures seen by people.

A more recent addition to the HTML5 family is the <track> element. It’s a sub-element of <video>, intended to make the video timeline more accessible. Its main use case is adding closed captions. These captions are loaded from a separate text file (a WebVTT file) and printed over the bottom of the video display. Ian Devlin has written an excellent article on the subject.

Beyond captions though, the <track> element can be used for any kind of interaction with the video timeline. This article explores 3 examples: chapter markers, preview thumbnails, and a timeline search. By the end, you will have sufficient understanding of the <track> element and its scripting API to build your own interactive video experiences.

Chapter Markers

Let’s start with an example made popular by DVD disks: chapter markers. These allow viewers to quickly jump to a specific section. It’s especially useful for longer movies like Sintel:

The chapter markers in this example reside in an external VTT file and are loaded on the page through a <track> element with a kind of **chapters. The track is set to load by default:

In above code block, we’re adding 2 properties to the list entries to hook up interactivity. First, we set a data attribute to store the start position of the chapter, and second we add a click handler for an external seek function. This function will jump the video to the start position. If the video is not (yet) playing, we’ll make that so:

That’s it! You now have a visual chapter menu for your video, powered by a VTT track. Note the actual live Chapter Markers example has a little bit more logic than described, e.g. to toggle playback of the video on click, to update the controlbar with the video position, and to add some CSS styling.

Preview Thumbnails

This second example shows a cool feature made popular by Hulu and Netflix: preview thumbnails. When mousing over the controlbar (or dragging on mobile), a small preview of the position you’re about to seek to is displayed:

This example is also powered by an external VTT file, loaded in a metadata track. Instead of texts, the cues in this VTT file contain links to a separate JPG image. Each cue could link to a separate image, but in this case we opted to use a single JPG sprite – to keep latency low and management easy. The cues link to the correct section of the sprite by using Media Fragment URIs.Example:

http://example.com/assets/thumbs.jpg?xywh=0,0,160,90

http://example.com/assets/thumbs.jpg?xywh=0,0,160,90

Next, all important logic to get the right thumbnail and display it lives in a mousemove listener for the controlbar:

All done! Again, the actual live Preview Thumbnails example contains some additional code. It includes the same logic for toggling playback and seeking, as well as logic to show/hide the thumbnail when mousing in/out of the controlbar.

Timeline Search

Our last example offers yet another way to unlock your content, this time though in-video search:

This example re-uses an existing captions VTT file, which is loaded into a captions track. Below the video and controlbar, we print a basic search form:

Three time’s a charm! Like with the other ones, the actual live Timeline Search example contains additional code for toggling playback and seeking, as well as a snippet to update the controlbar help text.

Overall, the HTML5 <track> element provides an easy to use, cross-platform way to add interactivity to your videos. And while it definitely takes time to author VTT files and build similar experiences, you will see higher accessibility of and engagement with your videos. Good luck!

Technical Evangelist & Editor of Mozilla Hacks. Gives talks & blogs about HTML5, JavaScript & the Open Web. Robert is a strong believer in HTML5 and the Open Web and has been working since 1999 with Front End development for the web - in Sweden and in New York City.
He regularly also blogs at http://robertnyman.com and loves to travel and meet people.

Nice – I’ve been really happy with how well the suite of web video technologies is coming together. Awhile back I created a simple project to create an interactive, synchronized transcript and was happy with how little work is involved on modern browsers:

Thanks for the code snippets and demos. I would like to know how accurate the cues are fired based on startTime specified in vtt. We implemented “interactive video” based on video.timeupdate event by positioning various DOM elements on top of the video layer (example here: http://www.bluemountain.com/ecards/st-patricks-day/wee-bit-o-magic-interactive/card-3387421). I found that video.timeupdate events don’t fire often enough to get accurate timing we desired. I would like to go with webvtt route once the browser support for oncuechange events are better. I like to assume oncuechange events are fired on time accurately on browsers supported so far. Anyone know if it is the case from your experience?

I found cuechange to be quite accurate before I switched to timeupdate for compatibility with Firefox. My code highlighting text was imperceptibly close to the subtitles or audio (assuming, of course, that your timecodes are that precise).

I don’t think you are right here. Clearly cuechange is close to subtitles, but currently no browser (tested in Safari, Chrome and IE) has implemented “high precission timing with text track. The accuracy is about 100-140. (Note the timeupdate is throttled to 250ms.) Here is a which performs a test: http://jsfiddle.net/trixta/q7w5rLgL/.

Simply play the video until 20sec and it will alert you with the precission. I used a similiar test to implement high precission timing in my polyfill. You can test this by simply adding the following line:

You’re right – a better question would have been “what precision do you need?”. The human visual system latency is somewhere in the 100ms range but audio latency is an order of magnitude lower.

For my needs, the precision from triggering on cuechange was perceptibly better than waiting for timeupdate event, making the text display synchronization close enough that I couldn’t detect a delay but I can imagine many scenarios where that would be different.

Hi, thanks for a good intro to VTT & some neat functionality. I’m pretty keen on the preview thumbnails, but the demo doesn’t seem to work for me in Chrome v37 on OS X 10.9. They look great in FF & Safari, but there is no effect at all when I mouseover in Chrome.
Anyone else seeing this? I have a site where I’d love to implement this functionality but I’m a little uneasy to use it in production if it’s not fully working. I’ll try it out for myself and will follow up if I have success.