When I last looked into this a couple of years ago, simple one-word speech recognition was rather complex and slow.

At the moment, I use Google Speech Recognition which uses no local processing power, and is very accurate and fast, allowing me to run on a very low end VPS.

However, with the minimum billing of 15 seconds, numbers and words like “yes, no” soon eat up the 60 minute free allowance.

Have things changed much in the last couple of years? I see a couple of new “standalone” projects even from the likes of Facebook and Mozilla, but they require a degree in C++ and, apparently, about 24 hours to build a voice model on a high-end box with the latest graphics cards (for the number crunching). Also, unless I’m reading it wrong, each second of speech takes 4 seconds to recognise on a low end machine with this standalone offerings and similar ones.

Common Voice is a project to help make voice recognition open to everyone. Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web.

I only need offline fast recognition of numbers 1 to 20, yes, no, menu and help.

For voicemail transcription I’m happy to stick with Google’s paid service as it’s remarkably accurate with phone quality speech (beats Microsoft and Amazon Transcribe hands down from what I can tell).

I looked at UniMRPC but it seems rather complex and the licensing doesn’t suit - 99% of the time I have one channel (caller) but it can jump to 10 very, very rarely - I don’t want to have to buy a 10 channel license for that 1 hour a month!