JBI Studios' Blog on Voice-Over, Dubbing, and Multimedia Localization

This is very exciting for e-Learning translation. Adobe Captivate has supported text-to-speech (TTS) generation since version 5, providing users with a powerful tool to increase course accessibility and engagement. However, this has been limited to English for the US, meaning that localization projects only have human voice-over as an option.

This blog post will look at the languages added, and provide a video sample of the voices themselves.

[Average read time: 3 minutes]

First, what is text-to-speech (TTS)?

Text-to-speech is a catch-all phrase for any process that converts text into a spoken output – basically, a voice-over audio file. TTS systems were developed initially to provide accessibility for the visually-impaired, as a tool that could read any text. But the technology has been developing at a rapid pace, and it’s now used in everything from IVR (think of your credit card’s phone system) to Google Translate. Stephen Hawking’s voice, for example, is part of a text-to-speech system – when he types a string, his text-to-speech engine turns it into a spoken output.

While the technology has been around for decades, lately it’s become quite good – TTS voices can now read with inflection, and making allowances for context, as we discussed in our blog post, Listen to a text-to-speech (TTS) voice in our video sample. Apple, Google and Windows have invested large sums on TTS voices for their smart phone platforms – Siri, for example, is an advanced form of text-to-speech.

New (and not-so-new) TTS support for foreign languages

Without further ado – the foreign languages now supported by Adobe Captivate TTS are English for the UK, French for Canada, and Korean. While this is a substantial development, it’s not completely new. Captivate 5.5 used to support both German and French. However, all foreign-language support went away with Captivate 6, due to a licensing conflict, it seems.

The new text-to-speech voices are accessed through the Slide Notes in Captivate. The text to be converted to voice-over audio is placed in the notes area – once there, the user can just press a button to create it, and Captivate places it directly in slide timeline. In the following screen shot, you can see the sample we created for this post – each language text uses a different voice, or voice font, to use the term of art:

Once the audio is generated and the slide is output to video, this is what you get:

Why TTS is useful for e-Learning and localization

Text-to-speech is a cost-effective tool for accessibility, to make courses 508 and ADA compliant. Likewise, many e-Learning course authors use it during development as placeholder voice-over, to work out any issues before recording with a human voice-over talent. Needless to say, this can save a substantial amount of money by avoiding pick-ups.

Finally, text-to-speech is a very cost-effective way to record voice-over. Moreover, it’s very quick – voice fonts don’t get tired, nor do they have to take breaks. A text-to-speech system can generate 10,000 words of content, which would take a really good human voice talent about 8 hours, in about 5 minutes. For localization, using TTS voices can mean much lower project costs and much shorter timlines. As the voices become better, they’ll become the default for e-Learning and corporate training content.

Do you have any experience with text-to-speech, especially when it comes to e-Learning? Leave us a comment below with any tips or stories you’d like to share.