Web Conferencing Tips, News, and Opinions

February 07, 2018

Last Fall, Zoom announced that they would be adding the ability for customers to have their web conferences automatically transcribed. It looks like that capability is now live and active. I have not personally tested it yet, but I am very interested. Jonathan Dame wrote about the new functionality on the TechTarget Network, and I thank him for bringing the software update to my attention.

The automated transcription function has to be turned on in the account settings for the Zoom account. It works with Cloud recording, but not with local recording to your hard drive.

Once you have recorded your online session, Zoom sends you a notification email when the recording assets have been created. You may choose to record audio in m4a format, or audio and video in mp4 format. The transcription service adds an extra vtt file, which contains the software's speech-to-text attempt, timestamped to synchronize with the meeting recording.

You will notice that I said "speech-to-text attempt." Zoom is not pretending that automated transcription is perfect, and you shouldn't expect that. Dame's article quoted Zoom as targeting an 89% accuracy rate under "ideal" conditions. The nice thing about the service is that Zoom lets you review the recording and the transcript side by side. You can adjust and edit the text in the transcript as needed, resaving it with the recording.

Once you have your transcripts saved, you can search your library of recorded meetings for a keyword or phrase and the software will return a list of meetings with that text. Inside a recording, you can do the same kind of search to find instances of its use within that meeting. Since all text is timestamped, it is easy to jump to the relevant section of the recording and watch or listen to the original speech at that point.

Zoom has partnered with a third-party company called AISense to perform the speech-to-text conversion. They claim to use a machine-learning algorithm to improve accuracy over time, and hopefully the software will become more tolerant of different speaker idiosyncrasies, accents, delivery speeds, and so on as it gets more data to work with from many different Zoom recordings.

Meeting transcripts are not that hard or expensive to produce. There have been plenty of companies providing the service for many years. Several things make the Zoom offering potentially more attractive however. The meeting host or administrator does not need to extract an audio file and send it to a third party. No human operator listens to your recording, so you don't have confidentiality concerns. Turnaround time is greatly reduced. And the automatic timestamp makes it easy to maintain synchronization. Plus, when the transcript is completed, there is no manual step needed in order to make the text displayable on the recording… It automatically becomes a selectable option using the CC toggle button in the recording video player.

This is a great real-world test case to see if the speech-to-text success rate will be high enough to satisfy customers, and to see whether they are willing to manually correct automated transcription errors. Since there is no incremental cost to using the service, it's certainly worth trying out the new functionality.

One application that immediately suggests itself is for offering translations into other languages. Once you have your edited vtt transcription file, you could send it to a translation service and have them return a new vtt file with the same timestamps, but with all content in the desired alternate language. If this takes off, I can see the possibility for Zoom to replace the simple On/Off toggle for the captions with a language selector. Then interested parties could choose from multiple vtt files all associated with the same source recording. This would be a great way to satisfy dual-language requirements in countries such as Canada.

Other web conferencing companies should be looking on closely, but for now, Zoom has the jump on them as far as integrating an automated approach to transcription. This is the kind of innovation we need more of in the web conferencing industry.