Why Our Crazy-Smart AI Still Sucks at Transcribing Speech

In an age when technology companies routinely introduce new forms of everyday magic, one problem that remains seemingly unsolved is that of long-form transcription. Sure, voice dictation for documents has been conquered by Nuance’s Dragon software. Our phones and smart home devices can understand fairly complex commands, thanks to self-teaching recurrent neural nets and other 21st century wonders. However, the task of providing accurate transcriptions of long blocks of actual human conversation remains beyond the abilities of even today’s most advanced software.When solved on a broad scale, it is a problem that might unlock vast archives of oral histories, make podcasts easier to consume for speed-readers (tl;dl), and be a world-changing boon for journalists everywhere, liberating precious hours of sweet life. It could make YouTube text-searchable. It would be a…