March 3, 2007

As a software developer with a physical disability that makes using a keyboard practically impossible for me, one of the most important capabilities of speech recognition that I always look for is keyboard emulation. And by keyboard emulation, I’m not talking about entering a bunch of common words and phrases like I’m doing while writing this article. This is called dictation. Rather, I’m referring strictly to the ability to key short (or not-so-short) sequences of characters and/or key combinations like myVariableName or myFile.doc. Words like these aren’t easily understood by the built-in speech recognition dictation engine because they are not in any dictionaries I know of (nor should they be), so another speech recognition mechanism is needed. This is called typing.

Vista’s speech recognition tutorial and the what can I say Windows help documents suggest one good way to type single keyboard keys—Press X. For example, you can say Press a to type the letter a, and you can say Press b to type the letter b. To improve accuracy, you can even say something like Press a as in apple to key the character a in case Windows Speech Recognition is having problems with your short single letter utterances.

This method works perfectly well and is indeed the best way to key a single character. However, using this command over and over to type multi-character sequences is quite tedious and inefficient. The main reason it is so slow to do this is the nature of it behaving like any other command; you must pause immediately before and after saying each Press command in order for it to process correctly. Imagine spelling myVariableName with Press m (pause) Press y (pause) Press Capital v (pause) Press a (pause) Press r (pause)… You get the picture. Luckily, there is another way.

What should you say? To enter a special typing mode, you can say Start Typing, and to leave this mode, you can say Stop Typing. While in this special mode, you cannot dictate words and you cannot do most of the command-and-control features available in the standard mode. It’s geared for typing—no more, no less.

What’s great about it is that you can key long sequences of characters with minimal pausing, which is a huge performance boost if you do this frequently like I do. For example, you can say Start Typing (pause) m y (pause) Shift v a r i a b l e (pause) Shift n a m e (pause) Stop Typing (pause) to type myVariableName. Sure, it doesn’t beat ten agile fingers pounding on a keyboard, but some of us (and some devices) don’t have that luxury.

To improve your typing accuracy, I strongly recommend that you learn the NATO phonetic alphabet (alpha, bravo, charlie, and so on). Windows Speech Recognition properly interprets these code words into their corresponding characters when you’re typing. I use the phonetic alphabet all the time when typing because it allows me to achieve near perfect typing accuracy. So to say myFile.doc, I would recommend saying Start Typing (pause) mike yankee (pause) Shift foxtrot india lima echo dot delta oscar charlie (pause) Stop Typing (pause). It looks like a mouthful, but it’s really not all that difficult once you get used to it.

Not to confuse the issue, but using the NATO phonetic alphabet also makes the Press command much more useful, as using it makes it capable of effectively entering short multi-character sequences as well. To say http, you can speak Press hotel tango tango papa.

As always, the best way to really learn how to type effectively using Windows Speech Recognition is by actually practicing doing it, so I’ll leave you with a list of the characters you’ll use most often when typing and their phonetic alphabet equivalents.