Speech Recognizer

Most speech recognizers work on the basis of two important parts: the acoustic model and the language model. The acoustic model essentially entails how individual characters are pronounced, also known as phonemes. The language model describes how the grammar looks like for example it would give the probability of word(X) being followed by word(Y). So we have to make sure we have both aspects covered for recognition of ROILA.

The speech recognizer that we are using is Sphinx-4 an open source recognition platform from CMU. We have reviewed a number of open source and customizable speech recognition tools and Sphinx was determined to be the most accurate and user friendly.

Installation of Sphinx-4 and using Eclipse to program with Sphinx-4

Extensive installation instructions are available for Sphinx-4. While downloading Sphinx-4 from sourceforge, you may be tempted to install the latest beta version 4. However doing so and later on running Sphinx from within Java gives a Class not found exception for WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar – which is the acoustic model class file. To have everything working smoothly please install either beta version 3 or 2.

Moreover you might want to use the Eclipse IDE to program in Sphinx-4, a very efficient tutorial is provided that explains the integration of Sphinx-4 and Eclipse and from the same resource there are also some notes on the configuration details of Sphinx. Please remember to follow the instructions of the online tutorial which indicates which .jar files you must add to Eclipse for Sphinx-4 to work. You must have atleast the following jar files included in your project properties:

js.jar

jsapi.jar

sphinx4.jar

WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

These jar files are found in the /lib folder of where you have Sphinx installed.

However before any recognition can take place within Eclipse, Sphinx-4 needs to be configured and modified so that it can recognize ROILA, lets see how this can be done.

Identifying context and a list of plausible sentences

Initially the first step that needs to be accomplished is to identify the context of use. What do you want to talk about in ROILA? Lets take our example of instructing a LEGO Mindstorms NXT robot to navigate through its environment. A simple scenario would mean commands such as:

In the above given commands we also provide sample audio clips as a guide of how to pronounce them. These audio clips were successfully reognized when passed in the Sphinx-4 recognizer.

Creating the Language Model

The next step is to list the plausible list of ROILA sentences in a text file. Once you have written out the ROILA sentences you can already construct a language model by going to the Language Modelling Tool provided by Sphinx. Upload your sample sentences text file in their tool and it will generate a language model when you press the compile knowledge base button. At this point download the language model file only, it will be a file with a .lm extension (a sample .lm file of the afore mentioned scenario).

Creating the Pronounciation Dictionary

Do not download or use the pronounciation dictionary generated by the Language Modelling Tool, we will go ahead and specify our own pronounciation dictionary which basically would suggest how we want the words to be pronounced. Here is our sample. Sphinx requires every word of the dictionary to be broken down into ARPABET symbols. Most linguists are familiar with the International Phonetic Aplhabet (IPA) standard, ARPABET is basically an ASCII representation of the IPA and a conversion is quite helpful.

For example the word FOSIT in ARPABET would be written as F AA S IH T

Note that by specifying our own dictionary we are provided with the freedom of defining our own pronounciations.

Setting up the XML Configuration file

So now you have two files that are ready and whose paths can be inserted in a configuration XML file that Sphinx uses while recognition is going on. For the afore mentioned scenario you are welcome to use our configuration file. You will have to change the following two paths in the XML file:

Other than that the file can remain as it is and is hence a standard XML file for ROILA recognition. These three files will now be the crux of the recognition. Note that still no recognition has taken place and to accomplish that we can move towards writing Java code.

Executing speech recognition within Java

Create a normal Java project in Eclipse and use the following sample java code as the heart of your source file. Remember to follow the settings mentioned earlier on this page in reference to adding the relevant .jar files. You should be able to get some recognition going with this, note that the program runs forever with the microphone constantly on listening for input. The result of the recognition is returned as a string which you could use as you desire.

String resultText = result.getBestResultNoFiller();

In addition, there is no parsing being done by Sphinx or by our sample program.You could do some parsing on the result returned by Sphinx and determine if it matches to any of the plausible sentences.

To communicate with a NXT Mindstorms robot we recommend using bluetooth where the recognition result is sent over the bluetooth as bytes. For more details about communicating with your NXT robot we have provided a brief tutorial.

Did you install the correct versions of Sphinx 4? I always faced problems if I installed something other than beta version 3 or 4 of Sphinx-4. Or there is something wrong in your XML configuration file. Are you using the sample XML file we provide? I do not have experience with Sphinx-3.

i tried the above instructions and it worked all fine..but i want to record an audio file and then pass this audio file to the above program to recognize speech..how can i do that..? please help me out…

I thank you for inventing a robot language that uses the simplest phonemes….I was in the process of doing my own version…..Aero Space communications via LaserCom was the goal. A universal language for Interplanetary/Intergalactic communications in the far future, that was as simple as possible. The B.I.O.P.S. stands for Biomechanically Integrated Organically Programmable Species. Would like to explain further, but the concepts involved are TS/SI for military purposes only. The ROILA use would be for A.R.M.A.D.I.L.L.O….Arrayed Robotic & Manned Aerospace Designated Intelligent Launch & Landing Organism…about to major research into your ideas and if I find something helpful, I will be glad to share.

ROILA is a spoken language for robots. It is constructed to make it easy for humans to learn, but also easy for the robots to understand. ROILA is optimized for the robots’ automatic speech recognition and understanding.