The first thing you need to do is build a language model or a grammar.

The grammar can be something simple in a format called JSGF, and this is the easier way to get a speech recognizer up and running. Alternatively, you can use a language model. The language model can be built using the instructions on the Sphinx site. You can create it starting from a file with sentences like this:

<s> I WANT A NEXTCUBE ZERO FOUR ZERO </s> <s> I WANT THE NEXTCUBE ZERO FOUR ZERO </s> <s> I NEED A NEXTCUBE ZERO FOUR ZERO </s> <s> I NEED THE NEXTCUBE ZERO FOUR ZERO </s> <s> I AM LOOKING FOR A NEXTCUBE ZERO FOUR ZERO </s> <s> I AM LOOKING FOR THE NEXTCUBE ZERO FOUR ZERO </s> <s> I AM SEEKING A NEXTCUBE ZERO FOUR ZERO </s> <s> I AM SEEKING THE NEXTCUBE ZERO FOUR ZERO </s>

A sample JSGF file would be (modified from the sample on the Sphinx website) … note that I’ve made all the words capitals because the CMU phonetic dictionary has all the words listed in caps (make sure that any language model is all caps as well, except for the sentence boundaries):

On the first day of a three-day workshop, I built a line-follower robot that successfully navigated what the instructor promised was a very difficult course (he said it would be impossible to navigate using a simple on-off algorithm).

The trick I used to complete the course was to run the DC motors on half-voltage and adjust sensor angles so that both always fed the ‘brain’ an excellent set of signals.

I came up with the idea owing to my experience with text analytics. The most critical task in text analysis is feature engineering. With a good set of features, you can get excellent results even if the machine learning algorithm is very simple. Unfortunately, very little work goes into feature engineering and feature combination methods for NLP.

So, I guess my weekend dabbling in robotics taught me an important lesson – no matter how good your machine learning algorithms (the brains of the system) are, they can’t do nothing without eyes.