Dienstag, 23. November 2010

As many of you will probably already know, we pay a lot of attention to make speech recognition actually usable by integrating it with existing applications. We do this by simulating conventional interaction patterns (mainly mouse and keyboard) through our command infrastructure.

simon 0.4, however, will also allow application developers to use the speech recognition much more effectively by providing plugins to call DBus and JSON functions.

If the application to be controlled has either of those interfaces, you can utilize these new command plugins to write simon scenarios that call methods in the application through the IPC layer. This way you can directly execute code with voice commands which makes the system much more robust and powerful than, for example, using global shortcuts for the same purpose.

Moreover, simon 0.4 provides a dbus interface to allow third party applications to execute simon commands directly as well.

Sonntag, 7. November 2010

Finally I find the time for a long overdue Blog update :). I already promised this in September when I blogged about the dialog system but I want to write a bit about simons text to speech infrastructure.

Because the next version of simon will be able to interact with the user through dialogs, we wanted to enable simon to actually "talk" with the user through the means of text to speech systems.

Of course we didn't reinvent the wheel but rather looked around at available open source solutions. We needed it to be cross plattform and work at least with English, German and Italian.

Naturally, Jovie (formerly KTTSD, KDEs text to speech system) is the obvious choice but it is not yet cross plattform as it uses speech dispatcher which only works on Linux. Also, it wasn't very stable when I tried it and had quite a few rough edges and missing features.

Furthermore the best (open) German voices I could find where HTS voices developed with and for the OpenMARY framework. They should theoretically also work with festival so they could be used with Jovie as well if someone wrote a festival configuration set for it. OpenMARY is cross plattform and provides very high quality synthesis but is a very big and heavy Java dependency which needs a lot of resources and is quite slow - even on current hardware (synthesizing a paragraph of text takes around 10 seconds on a Nettop).

So we decided to do what we always do and leave the final choice to the end user:

simons TTS framework now allows you to use Jovie (default), a generic webservice (like OpenMARY) or to record sound snippets yourself.

The last option is especially helpful if you are dealing with languages where no good open voices exist yet or your users who have trouble understanding them.

Simply create a new TTS set for your speaker (the one recording the sound bytes) and record the needed texts with him / her. When recording texts, simon will show you a list of recently synthesized texts so you can easily record whole dialogs quite quickly. Instead of using the Jovie or OpenMARY to synthesize the text, simon will then play back these recordings.

These TTS sets can be exported and imported so you can share your sound snippets with others - for example accompanying the scenario containing the dialog which uses them.

Multiple TTS backends can be used simultaniously which means that you can use pre-recorded sound bytes primarily but fall back to a TTS system for dialog paths you have not (yet) recorded.