Case: Swedish Prime Minister Stefan Löfvén with live captions

In May 2015 WestreamU for the first time tested a new solution, still in beta, for live stream captioning. Our guinea Pig was no less than the Swedish Prime Minister, and the captioning was made by two interpeters who never had done this before. In spite of very challenging conditions everything went pretty well, as you can see in this case description.

Initially WestreamU was asked to just make video recordings of the plenary opening as well as four seminars (two in parallell). With respect to budget restrictions, HSO this time chose to only add closed captions when the videos later were published at YouTube (playlist embedded below). At previous meetings HSO have used live sign language interpretation as well as audio descriptions on their web casts.

When it became clear that Sweden’s Prime Minister Stefan Löfvén would deliver the key note address, we were asked to also live stream the opening session, with live captions.

Now we faced a few challenges. First, it was difficult on such a short notice to find captioners. Luckily the Interpretation Center in The Örebro Region could book one experienced captioner, and one who was under training. But neither of them have had any experience with captioning for live web-TV.

Second, our new solution for live captioning was still in an early beta stage. When ready it will be offered as an addition to Text-On-Top, an increasingly popular application that enables spoken text, converted into written text by a captioner, to be shown on top of any presentation. For more about information about this project, see the forum topic about Text-On-Top Video. Now, the point here is that there was neither time to make this solution more user friendly, nor to perform some serious testing.

Who dares wins! Said and done. A few days before the conference I went to Örebro and spent a couple of hours with the captioners. They quickly learned how to use Text-On-Top Video and soon got the hang of the live part. Credits to them who dared not only to use untested software, but also to work with it live – while interpreting a Prime Minister!

From a technical perspective our preparations included publishing two live streams at HSO’s Solidtango Play channel. One “original” stream without captions, and one with open captions. Viewers could easily select which stream to follow.

On the day of the Congress we produced the original live stream by switching between our three cameras at the venue. We also added graphics, including logotypes and lower thirds with the names of the presenters.

In a studio in Örebro (200 km east of Stockholm) the captioners watched the original stream and added their captions to the second Play channel live stream. In a behind the scenes video (embedded below) you can see the production unit in Haninge, and how the captioned live stream looked like. The latter was about 30 seconds behind the original stream due to delays induced by this particular streaming solution.

About 15 minutes after the opening session was finished Solidtango had transcoded the live recordings and we could republish these on the HSO Play channel. The recording with the live open captions is still available there (the Prime Minister enters at about 30 minutes into the recording). Our team in Haninge then went on to record the parallell sessions. At the end of the day the session recordings was published on HSO’s YouTube channel, as well as transferred to Örebro over the internet.

One of the features with Text-on-top Video is that it records every written word with a time stamp, thus preparing for closed captioning. Still, during the live streaming the captions are actually superimposed on the video. This way we are not dependendent on any particular streaming service/CDN (Content Delivery Network).

Furthermore, while Capblaster (our previous solution) could only be used with one particular video switcher (VidBlaster), Text-On-Top Video will be able to support more. The current beta also works with vMix, and can easily be developed to support any switcher with an API. In the pipeline is also output for chroma key enabled switchers, remote captioning, and perhaps even automatic translations.

Bacl to the current case. For the opening session we already had it’s text. Within a few days the captioner in Örebro had also created text for all the session recordings. This was again done with Text-On-Top Video, but by watching the recorded video instead of a live stream.

Compared to traditional captioning/subtitling, doing it “live” is a very quick and highly cost effective solution, at the cost of lower quality. It was interesting to see how our new captioners drastically improved the quality from to what I personally would call “OK” to “pretty good”. For example, during the opening session there was quite a few gaps in the text. By this I mean sections over 10-25 seconds that the captioners did not manage to write down. The last video that was captioned had no such gaps.

We then re-synced the time stamped text recordings with the recorded videos in order to make necessary adjustments, as the captioners are typically behind what is being said. In other words, we made sure that the one, or two lines, of text was in acceptable sync with the speech. We also corrected some obvious and grave errors in the text. For examples, filling text gaps, spell checking names, correcting wrong numbers, etc.

Besides valuable lessons on how to improve Text-On-Top Video, we became even more convinced that this will be a nice solution when it is ready for public launch. These improvements include being able to more flexible remote captioning without any significant delays. And hopefully support for live closed captioning.