New web-based service can make audio transcripts easy

COMPUSCHMOOZE

STEVE LUBETKIN

One of the standard pieces of advice that podcasters get from consultants is that attaching a written transcript of an audio program to the web page where the podcast is posted can be helpful getting search engine visibility for the program. Google and the other search engines don’t do a good job indexing audio or video files, but they can index a text transcript, so if getting on a higher page in a search result is important, a transcript might help.

Of course, transcribing a 20- or 30-minute audio podcast word-for-word is something most podcasters (and even non-podcasters) dread. It’s tedious, repetitive (listening to segments over and over to get the word order just right) and time-consuming. It could easily take twice as long to transcribe a podcast as it does to listen to one. And since most hobbyist podcasters don’t generate much revenue from their programs, paying a transcription service to generate the file is usually out of the question.

Automated transcription technology using voice-to-text software has, up to now, been more trouble than it was worth, getting so many words wrong that it was just easier and more reliable to do the work manually. But a new web-based service, sonix.ai, is changing that. A subscription based service, sonix.ai accepts a range of standard audio and video file formats, which you can upload from your hard drive or from a cloud-based file sharing service like Google Docs or DropBox.

Once you upload the file and click the “Start transcribing” button, you can leave the sonix.ai site. You get an email confirming the upload, and a second email when the transcription is completed. You click the link in the email, and it takes you to a window displaying the draft transcript.

What’s great about sonix.ai is the editing interface that marries the rough transcript with an audio player synchronized with the text file. You can use your mouse to click through the transcript and wherever your cursor lands, the audio player picks up the transcript text right there. Keystrokes control the audio player (the tab key is the start-stop key for audio) so you don’t have to take your hands off the keyboard. Other transcription programs require expensive foot pedals or moving the mouse to replay a segment. As you go through the file you can make any kind of punctuation, spelling, or sentence changes, which are saved automatically. The service offers several different export formats, including Microsoft Word, so once you’ve finished editing a transcript, you can send it to Word for final polishing. You can also share access to a transcript with others who might need to collaborate in the editing.

Sonix tries to break the transcript paragraphs by recognizing changes in the speakers’ voices. It will label the change in voice “Speaker 1,” “Speaker 2,” and so on. You only have to customize the speaker name labels once and then you can select the names from a drop-down, which speeds the editing process.

This is probably the service’s weakest link, as it sometimes breaks a single speaker into multiple identities if they pause for too long, but it’s as simple as pressing the backspace key to recombine the text associated with a single speaker.

The accuracy of the transcript depends on several factors, including the audio quality of the file, so my experience has been that better transcripts come from high-quality recordings, and I will need to do more editing on files recorded over the phone. Sonix provides a color-coded “thermometer” feature that will highlight different levels of confidence in a particular word or phrase, and you can move right to them for precise editing.

I was hesitant to use Sonix at first, mainly because the service charges a monthly fee ($15 a month or about $100 a year if you pay in advance) plus a time-related fee for each transcript. I thought the time-related fee was based on the actual recorded length of the audio file submitted, which would make the cost prohibitive except for the largest corporate clients. What I found is that the time billing is based on how long the sonix.ai system takes to transcribe the file, which is actually very short.

I’ve been able to use the service to transcribe press conferences and seminar sessions I attended. It’s a time-saving way of looking for quotes I want to use in news stories and columns like this. And if I can upload audio from the location, the transcript is usually waiting when I get back to my office.