Search form

Live Speech and Dialog in a Virtual Machine.

The “Interaction in Virtual Worlds” VM allows you to “play” with an existing (English) speech recognizer supporting live decoding, and experience an open source speech dialog system in a virtual world. The VM contains everything you need, except for a “viewer” for the OpenSIM based Virtual World, spawned by the VM.

Availability

Virtual Machine in OVA format.

Support

Supported as part of the "Speech Recognition Virtual Kitchen", see the FORUM.

Interaction in Virtual Worlds README

This README corresponds to Mario2-IVW.ova, current as of 20140911.

There are a couple experiments that can be performed with this virtual machine. The first one is real-time speech recognition (‘live decoding’) using the Kaldi online decoder. The second is a full “interaction in virtual worlds” speech dialog system, which you can fully control.

Table of Contents

I. Installation

Install Oracle VirtualBox (http://virtualbox.org/), along with the matching Extension Pack. The VM will work best with version 4.3.16.

Set up a host-only network in VirtualBox in the VirtualBox graphical user interface, via “File” → “Settings or Preferences” → “Network”. Click on the Host-only Networks tab, then click the network card icon with the green plus sign in the right, if there are no networks yet listed. The resulting new default network should appear with the name ‘vboxnet0′.

Import the Mario2-IVW.ova file into VirtualBox and run it.

Make sure that your microphone can be used from inside the virtual machine. This includes checking that it works outside the VM, first (checking levels that are present, but not distorting) and, then, checking them within the VM (via the VM’s “System Settings” → “Sound”. The total level is a combination of these.

While running Singularity, create a new grid. Click the “Grid Manager” button, click the “Create” button, then In the field “Grid Name”, name it whatever you want, and enter in the “Login URI”: http://192.168.56.101:9000. Click “OK” to close this window.

The username is “World Master”, and the password is “avatar”. You will use these, as well as the grid name you just chose, to log into the virtual world. But not until starting the server within the VM, described next.

The password for the virtual machine is ‘?1zza4All’ should you need it

To start the “Interaction in Virtual Worlds“, double click the icon on the desktop with the headphones icon that says “START”. This starts 4 processes:

The OpenSim world server

The Kaldi Online Decoder speech recognizer

The Stanford CoreNLP Parser

The Kaldi/Parser client

You can read what the commands are in startBackend.sh, if you want to run them separately or display the terminals. It will take a long time for these to start up. You will know when all the processes have started when you see the terminal window “Kaldi Online Decoder” display several lines of “ALSA lib pcm” warning messages.

Now you can log in the World Master avatar to Singularity, as in Step 6 above.

Once the world loads, open MonoDevelop (third icon down the taskbar on the left of the VM screen), click on the IVW solution, and run it with the play button. This will start the SampleBot project within the IVW solution, and log the “Friend Bot” into the virtual world, the computer-controlled character that you can talk to. It will also connect to the Kaldi Parser Wrapper and receive text from the speech recognizer. The IVW solution contains another project, “DogBot” that has lots of sample code that can be used to extend the system.

Warning: If the parser times out (both the parser and the wrapper will throw an error if they’re both running), then please restart: first the parser, then the wrapper. (See below)

III. Customizing the system

Here are the locations of all the code:The START icon runs startBackend.sh, which is located on the Desktop. Here, you can find the commands to run parts of the system individually.

Kaldi Online Decoder: /kaldi-trunk/egs/voxforge/online_demoWe use this code, but plug in models trained on TED talks: tedliumThis is the speech recognition system.

OpenSim: ~/Desktop/opensim-0.8/This sets up the virtual world. If you cannot login to Singularity, it is probably because you need to wait for this.

Stanford Parser: ~/Desktop/parse/corenlp.pyThis is the server for the parser.

Kaldi Wrapper: ~/Desktop/parse/wrapKaldiLive.pyThis code takes the output of the Kaldi Online Decoder, feeds it into the Stanford Parser, and makes the results available to the bot code as a TCP socket service on localhost port 9999.Make sure that the Kaldi and the Stanford servers are running before you start this code. If this script throws a Timeout error, please restart the Stanford Parser and then restart this client.

Communicating with the Bot: ~/Desktop/Bot Development/SampleBotAccessible in MonoDevelop (IVW solution, SampleBot project), this code logs the bot into the virtual world and polls the Kaldi wrapper. Make sure that the Kaldi online decoder and wrapper are running before you start this. NOTE: if this code is suspended in debug long enough, the bot will disappear from the virtual world. (The virtual world protocol requires ‘keep-alive’ messages behind the scenes)

IV. Bot Actions

At the moment, the bot is very limited in its capabilities. It can tell you where certain objects are located. It looks at its surroundings and sees if the object’s name can be found, and if it can, it tells you how many meters away it is from the bot. This is limited to a search radius of 20 meters; you can experiment with this.

There are a LOT more things you could add to extend this system, some of which exist in the DogBot project (in Mono). Feel free to play and drop us a line. More details about some of the included technologies can be found here:

Advanced Topics

Kaldi Online Decoder Model Files

Kaldi language/acoustic model graphs produced by trainingexamples (“egs” such as egs/tedlium) consist of several files:

HCLG.fst, matrix, model, phones.txt, tree, words.txt

This list of files makes up a ‘model’ in the Kaldi online decode example. Models are located in named folders under

/kaldi-trunk/egs/voxforge/online-demo/online_data/models/

They come from the output of running other Kaldi experiments, such as

/kaldi-trunk/egs/tedlium

Here is a mapping of model files, and their origins, for the final stage of the ‘tedlium’ experiment. On the left are the names in the onlinedecode models folder, on the right are where the files originate

The above training example has the name “tri3_mmi_b0.1″ as the final stage (training tends to build upon previous stages) and each stage gets a new folder name under exp/. You can usually get the name of the final stage by looking the end of run.sh.

If this information is inaccurate or incomplete, please submit an update through this form.