who spoke when

Anonymous (not verified) - Sun, 10/19/2014 - 17:12

Did you notice the coloured bar with stripes under the video?

What is that bar?

Each colour segment in that bar represents the time a different speaker spoke in the video. This is called "speaker diarisation", it identifies who spoke and when. This is a different problem from voice recognition, we are not trying to identify what is spoken.

Why do that?

Well, in this case this is useful to identify different segments of the talk, so I can skip the introduction, jump through the questions in a Q&A, identify a discussion between groups of speakers, get an idea of the general flow of the talk... basically it gives a "bird's ear view" of the media and facilitates its navigation. Yes you can already click and drag along the video slider to see snapshots in youtube videos but you still have to browse slowly through all snapshosts. Also audio files don't have snapshots.

This is great! Why is this feature not all over youtube?

The implementation described below is not scalable as is. The computation to identify the different segments is extremely intensive, it can take almost as long as just playing the media itself

The chosen colour palette (Accent1), based on colorbrewer2, is optimised for categorical data, in this case different speakers, it provides maximum hue contrast between colours. It is also suitable for dichromats.

This speaker diarisation bar was inspired from the "moodbar" that I've been using for many years to navigate music files.