Serverless speech recognition with WebAssembly

What is this?

It’s the classic text adventure Zork and a version of the Kaldi speech recognition research project running locally in web browser using WebAssembly. Speech recognition is too computationally intensive to do completely in JavaScript. WebAssembly allows the code to run at close to native speed.

What is it good for?

Beyond being a cool demo, WebAssembly speech recognition has a lot of advantages:

After the browser downloads the models, no further connection to a server is needed. This means offline voice apps are possible and flakey network connections won’t cause long UI response times.

No data needs to go to a third party server, so users’ privacy is protected.

For real power users of speech recognition, Kaldi is much more flexible than any cloud API. Running Kaldi in the browser lets you customize things without having to pay cloud computation costs.

How does the speech recognition work?

That’s a question for another article. Very briefly and somewhat accurately, a neural network turns audio into phones, and a hidden Markov model helps to turn the phones into words. There are lots of other ways to do speech recognition, including with a big neural network and nothing else, but using an HMM seem to be best for typical situations.

What’s next?

What’s next is a library (kaldi.js) that makes it easy for anyone to add a voice UI to their web app, as well as better performance and better accuracy. Stay tuned, take a look at the code, and let me know what you’d like to build.