Subtitling Special: 3Play Media

Interview with Lily Bond, Director of Marketing

Tell us briefly about your company. How was it born?

3Play Media originated out of MIT in 2008, where the four co-founders met at the Sloan School of Management. One of them was working with a group at MIT to caption videos, and spent the next year developing a better process for captioning and transcription. Almost 10 years later, 3Play Media is still operating out of Boston, MA and provides high quality captioning, transcription, subtitling, and audio description services to over 2300 customers.

What makes your company different from its competitors?

Unlike traditional captioning companies, 3Play Media utilizes a combination of technology and humans to make the process more efficient and cost-effective. Every file is first put through automatic speech recognition. Next, a certified editor cleans up the transcript. Finally, a QA editor reviews the transcript and finalizes any flags the editor wasn’t sure of. All of our editors are US-based. This process allows us to reach a measured accuracy rate of 99.6%, which is much higher than most competitors. Another way we differentiate ourselves is by focusing on the customer experience. Our goal is to make the process of captioning as easy as possible – this includes a user-friendly account system, round-trip integrations with leading video platforms, flexible upload and download options, simple interactive plugins, dedicated customer support, and several different turnaround options.

What are your company goals?

Our initiatives focus on innovation and continuously improving our technology to make the editing job easier and better. We are committed to treating our contractors, employees, and customers well and to improving the process for everyone. Our goal has always been to make the process of captioning/transcription/audio description – which have traditionally been fairly cumbersome – as easy as possible. We constantly work to reduce barriers to adoption and drive towards a world where every video is accessible to allow consumption by all viewers.

What equipment or technology do you consider essential for your workflow? Would you highlight any recent purchases or innovations?

We utilize speech technology both for captioning/transcription and for audio description. For captioning, our process relies on automatic speech recognition (ASR). For audio description, we use synthesized speech to voice the descriptions. Even more important, however, are the software and processes we’ve developed for editors and describers, as well as for onboarding, deadline management, and quality management at scale.

75% of the transcription and subtitling process is made by a computer. Do you think, in the future, all this work can be realized accurately by a machine?

We don’t believe that humans will ever be taken out of the equation for long-form transcription. While speech recognition is extremely good for “assistant” tools like Siri and Alexa, it often fails for long-form transcription. With long-form transcription, ASR struggles because it cannot learn the voice of the speaker(s) through repeated use (like you see with Siri). In addition, it deteriorates when there are multiple speakers, background noises, accents, or poor audio quality. Finally, ASR often makes errors that make sense acoustically but not linguistically (like “forester” vs. “four story”) and on clarification words like “can” vs. “can’t.” The meaning of the sentence is reversed if that error is made, and a human is much less likely to make that mistake.