Mobile. Audio. Clever stuff.

Menu

Tag Archives: speech recognition

I am writing this review, or should I say dictating, in Nuance’s Dragon NaturallySpeaking 12, the latest version of what is in my experience the most accurate speech recognition system out there. Accuracy has got to the point where the great majority of words are recognised perfectly. There are a few intractable problems though. How is a dictation system meant to distinguish between nuances and Nuance’s, for example? The answer is generally that it cannot, but in mitigation Dragon has an excellent correction box. You speak a command to select the intransigent word, and either select the correct spelling from a list or in the worst case spell it out. After a bit of practice you can progress quickly and easily.

First, a few quick facts about the system. Your first task after running setup is to set levels and check the quality of your microphone. Nuance supplies a microphone in the box, which is worth it because the average user is unlikely to have a suitable microphone of good enough quality. That said, I was unhappy with the quality of the microphone supplied this time around and will return to this issue later. There is a handy fold-out reference card supplied, a nice touch.

Once set up, Dragon walks you through a quick training exercise during which it sets up a profile with some knowledge about your particular voice. I remember spending ages training early voice recognition systems and it was a tedious procedure. This is no longer the case and Dragon can be set up effectively in just a few minutes.

Dragon runs by default with a menu bar across the top of the screen and a contextual sidebar which lists common commands for the particular application you are using. The sidebar also gives a quick reference to global commands such as those to wake or sleep the microphone, move the mouse, or even post to Twitter or Facebook. Once you have learned all the commands, you can close the sidebar to get your screen space back.

Dragon works best in applications which are supported, which includes the obvious ones like Word and OpenOffice. In other applications you can use a dictation box which lets you dictate into a Dragon window and then transfer your text in either plain or Rich Text Format. Microsoft Office support depends on an add-In. Unfortunately I am currently running the Office 2013 preview and the add-in currently causes Word to crash. No doubt this will be fixed when the final version of Office is released. As an alternative I used OpenOffice which worked fine. I was also able to use Word 2013 with the dictation box.

While the accuracy is impressive, I did find that recognition slows down on occasion for no obvious reason, which is annoying and slows down your work.

Dragon is not limited to text input. You can run your entire Windows session with speech, using it to switch between windows, move and click the mouse. I found that Dragon works well in dialogs, using the Tab command to switch between fields, and Click … to click buttons and checkboxes.

If you have the Premium edition, you can also use Dragon to transcribe recordings and to read back editable text. Do not get your hopes up too much. If you create a recording of your own voice using a high quality recorder, you can get good results. I tried transcribing a telephone call though, and got gibberish.

So what is new in Dragon 12? It has to be said that version 11.5 was already very good. Accuracy is perhaps slightly improved, but not as much as 11.5 improved over 11. You do get the Dictation Box. You also get browser extensions for the Web-based Gmail and Hotmail provided you use a supported browser, which includes IE9, Firefox 12 or higher, and Google Chrome 16 or higher. I tested this with Gmail in Chrome and it does make a big difference to usability. Go to a Google Doc though, and it is back to the Dictation Box.

Also new in version 12 is the ability to disable voice commands that you do not use to boost performance. The full list of new features is available on the Nuance website.

Now about that microphone. The headset that came in my box is called the HS-GEN-C, and include an adaptor so it can be used with the combined earbud/microphone inputs now common, especially on tablets and laptops. However I had difficulty getting this to work well. It failed Dragon’s built in microphone test at first, though with some effort and speaking more loudly than usual I managed to get it reported as “acceptable. This could be because of a poor microphone preamp on the PC, though I got the same results with another machine. I did not want to test the software with doubtful microphone input, so I used a the Plantronics Bluetooth headset that came with Dragon 11.5 instead. This passed the microphone check first time.

I also tried Dragon NaturallySpeaking with Windows 8. The news is mixed. On the plus side, Dragon worked fine in the Windows desktop and with applications like Google Chrome and OpenOffice Writer. When I switched to the Modern UI (formerly known as Metro) though, I could not get Dragon to work at all. This does not surprise me since the Windows Runtime environment is different from the desktop. I do not see how the Dragon sidebar will ever work, for example, since all apps run full-screen. Nor is the Dragon bar available in the Modern UI. Microsoft does claim an accessibility story for Windows 8, and I am asking Nuance what if anything is planned for Dragon NaturallySpeaking in this respect.

Do not try to use Dragon with Microsoft’s Office 2013 preview; wait for the final version and proper support.

Conclusion

Dragon NaturallySpeaking combines a high standard of accuracy with strong correction tools. If you are wondering whether speech recognition is a viable and productive technique for text input, have no doubt that it is.

There is still scope for improvement. If I can make sense of my recorded telephone call, then in principle voice recognition should be able to do so as well. It will get there.

Is Dragon now more productive than keyboard and mouse, if you have the choice? It may be in some scenarios, but probably not for expert typists. If you are in the habit of frequently switching applications, for example to research an article you are typing, Dragon can get in the way.

Is Dragon 12 worth the upgrade? From 11.5, that is doubtful unless one of the new features matters a lot to you, perhaps because you use Gmail frequently, for example. From older versions, it probably is.

I am puzzled why Nuance supplies what in my experience was a poor headset for the purpose, though you may be luckier (and the box says “actual model may vary”). I preferred the Plantronics headsets that used to be bundled, but guess that the cost was higher. If you do serious amounts of dictation, do not skimp on the headset as it soon pays for itself.