Subscribe To

Wednesday, November 27, 2013

Yet another followup on voice recognition software limitations

This was intended to be a short note on voice recognition (VR) software, but it turned out to be a fairly complex issue. One of the most talked about developments in this area involves voice recognition at a gadget level such as the Apple app Siri. From my own perspective from using dedicated VR software for years off and on, I think Siri, the voice activated app for iPhones is mostly a bother. It almost never works for me, but it does highlight a major VR issue: the software trains you, you don't train the software. You have to realize that the software is set to work in a particular way and has very defined limitations. Until you conform to that narrow set of limitations, using the VR software can be extremely frustrating and annoying.

VR is not really related to all of the voice notification systems that are out there such as the voice that announces the floors on elevators or the recorded messages that accompany so many products. But if you think about it, many of larger companies use VR as a substitute for hiring a live person to handle routine telephone calls. Occasionally, we will have trouble with our network connection. We have Cox Communications as our network supplier and the last couple of years, Cox has gone almost entirely to VR in its first level of technical support. Typically, the recorded message will give you two or three options to orally reply to a question. In actual practice, I have found their system to resolve nearly all of the connection issues we have from time to time and I have come to prefer dealing with the computer voice over talking to a live support person. The reason is a little complex, but involves issues with consistency. Also, the machine doesn't get distracted by another customer or another issue.

At one level, VR is nothing more or less that word equivalency. I say "computer" and the software recognizes that string of sounds in English and produces an equivalent string of ASCII characters in whatever program I am using. In practice, what the software does in order to avoid an overload of mistakes, is to compare word patterns. Most of the errors in dictating to the programs come from lack of context. If you speak normally and include longer phrases, the programs can compare the sounds to longer strings and therefor obtain a higher level of accuracy. When I say that the programs train you, what I mean is that as you watch what the program is doing, you can adapt to the optimal length of phrase that works best with accuracy. Essentially, by trial and error, you develop a method of dictation that conforms to the optimal rate of accurate transcription.

Looking at VR software from an entirely different perspective, it is apparent that using VR regularly requires a rather sophisticated level of computer knowledge at both the software and operating system level. If you have difficulty with the basic functions of the computer, i.e. opening and closing programs, entering passwords, saving files, finding saved files and other such activities, then VR will be overwhelmingly difficult. If you analyze what VR is doing, it is really a very high level method of programming. You are giving instructions to the computer by voice rather than by keyboard. Using VR also implies that you have complete, or almost complete mastery of any program you are using to record your text. For example, if you are dictating into Microsoft Word, it presupposes that you already understand all of the basic Word commands because in a very real sense, you are adding the simultaneous use of an additional program, the VR software, to whatever other program you are trying to use. Since you are utilizing an operating system (Windows, Apple OS X, Linux etc.) and then another program such a Word or Blogger, adding the third level of employing a VR program implies a giant leap of complexity. For example think about using an iMac computer to run Parallels Desktop and then adding a word processing program and then add VR. If you think about it you have multiple layers of application specific commands to deal with all at the same time. This implies a level of knowledge and ability that can be utterly impossible for some people to adapt to.

We are still a long way from carrying on casual conversations with our computers, but now we do have a very workable tool in VR software.