“Computer, where are my nail clippers”

thoughts on voice recognition for computers

Why Voice? Voice doesn’t make sense in so many areas, particularly in the office cube or even the home office. Key to a company’s success in this area will be understanding where this is best used, but more importantly, where people are most comfortable with using it.

After using Vista for several Months now, I would like to share my findings of where it’s much improved Voice recognition will be useful. Microsoft have already hinted that the next OS after Vista codenamed ‘Vienna’ will significantly build on voice recognition in Vista. So, the general theme with all of these scenarios is that voice only works and will only be used when the user is happy for other people to hear what commands/information are being said. If you are even slightly concerned that what you are saying shouldn’t be overheard, you will revert to keyboard or a pen. Also, if you are around other people doing the same thing, it will look less weird that you are talking to the computer. Therefore many of these are scenarios are already social by nature.

1: Home Automation

I have dabbled with automating lights/heating (using X10 and Insteon). There are a whole bunch of companies making easy-to-install hardware, but more importantly, plugin software into Vista MediaCenter to control devices through your big screen. For the record, this makes a lot of sense, as the TV is largely based in the Living room, where people spend a lot of time and can therefore easily be described as a ‘hub’ room in the house, an ideal space for access to control it.

Uses: so who doesn’t want to be able to walk into a room and say ‘theatre’, and the lights dim, the TV and DVD player turn on and the large screen displays the dominos website alongside the DVD menu ready to order?

2: Media Control

This is quite simple, play, pause, volume control. Again, you’ve got the guys around, the level of the volume is hardly confidential.

3: Web Services

Eh? I hear you shout. Well, I will not explain what a web-service is in the programming sense of the word, but I think everyone is more or less aware that you can go to a variety of websites to get weather information for your area. So imagine waking up and asking “Google, what is the weather” or “yahoo, what are the Intel stocks prices” or “New York Times, what’s the top technology news”….. and the computer will read back the information gained from a web-service or rss feed. Maybe, if the voice system is clever enough (and not weighed down by MPAA imposed DRM architecture) you could simply ask “outlook, show me my latest email [on my primary screen]” .

Security

One thing to note; Voice in all of these situations should be complimentary to a remote, keyboard or traditional switch. Voice should always be a complimentary service, not the only one available, for security/safety reasons, but also as the technology advances, fail-safes for functionality are important – you don’t want to lose the ability to turn off the kettle when the computer crashes, nor do you want it to carry on boiling and blow up. So what if it goes wrong, or doesn’t understand your voice?

There are already guidelines for the US and the EU on what computerized systems are allowed to control and standards for reliability and safety, rest assured knowing that if Windows (or 3rd party software) wants to provide services, they will have to pass strict tests, and for hardware manufacturers – provide mechanical fail-safes. But this doesn’t stop us devs/hobbyists blowing ourselves up!

As for the semi-recent ‘exploit’ where Vista’s voice recognition apparently can be used to format the hard disc, this is utter crap and FUD, mainly because in order to do anything requiring such security the user is prompted for Administrator access (UAC) which cannot be interacted with by voice.

What to expect

Simple voice actions are already built into Windows Vista, like ‘open windows media player’ and ‘Start, search, search input, mydiarydotdoc’. Microsoft will likely make this a lot more user friendly and not bound to a screen in future versions. Making the voice experience not bound to the screen will be important for a couple of reasons; firstly, it will be a huge hit for people with Accessibility issues, and secondly a bigger hit for consumers who are clamouring for the star-trek computer experience.

As with Bluetooth headsets, the voice activated computer is a struggle between technology being available and society accepting and adopting it, where it makes sense. Further down the road, although ‘dictation’ is currently a little alien to most people used to keyboard entry, I believe people will slowly change to use these features as they become easier to use. The new generation of computer savvy kids that have grown up through huge technology changes will increase the pace of adoption of these new technologies as they are bred with less resistance to change.