NOTE: The shell script mentioned below can be used on any Linux-Operating System with some software requirements, because the speech recognition is not performed on the local machine.

Because the performance of your Freerunner is too poor for voice recognition, the Google Voice API can be used to convert a recorded Audio file into a text file. Be aware that the audio file will be transmitted to Google and the recognition is not performed on your FR. This implies, that you need to have Internet access on your freerunner FR to submit the audio file.

NOTE: You must be aware of the fact, that the follow script is running on your freerunner but it is not a standalone voice recognition software and so you might not want to use this tool for private audio files.

Install the packages from the repositories of the freerunner Distributions.

Script Usage

The script googlevoice.sh uses a the audio file message.wav in the directory of the script. All files are stored in the same directory, so you need write permissions for that directory.

SoX converts message.wav into message.flac

wget submits the file message.flac to the Google Voice API and writes the return message to message.ret. The language variable in the script is set to German by lang=de-de. If you want to submit a recorded file in US-English use lang=en-us instead.

SED extracts the recognized text message.ret by regular expressions and writes the text into message.txt.

Temporary files message.flac and message.ret will be deleted after the process.

NOTE: The return code of German audio files needs capitalization of nouns, because all words are return in small caps. A ispell or aspell correction of the message.txt might improve the recognized text.

Basic Script Code

The script code googlevoice.sh can be tested on any Linux machine with SoX, SED, WGET installed. Modifiy the script according to your demands and storage of your audio files

The parameter lang=de-de is indicating, that the Google Voice API is expecting a German language audio file. Replace lang=de-de by lang=en-us to submit an audio file in US-English.

Script with Language Setting and Command Line Parameter

The script googlevoicepar.sh with a command line parameter can be used if you want to use multiple input files for batch file recognition. You will call this script with the basename e.g. message0, message1,... by

As mentioned above is speech recognition will not be performed on your Freerunner, the audio file will be submitted to a remote server and the the remote server will return an XML-File with info on accuracy and the recognized text. To compare the performance with a standalone speech recognition software, you have to test that on a Linux Box (e.g. Ubuntu)

as a text environment of a Large Vocabulary Speech Recognition Julius was used with the Julius-VoxForge Accoutic Model for English from the repository on Ubuntu 10.10.

Start Simon from Commandline and test a few words with your mic and record it to an audio file.

simon

Compare results with same audio file by Google Voice API and Simon.

NOTE: These test might help to create a remote Speech Recognition Solution based on Julius and Simon for usage on a Freerunnner. Simon can run as server. Tests can be used to connect your freerunner to Simon.

Future Development

NOTE: Also other Speech recognition apps on Android (FlexT9) or on the iPhone (Dragon Natural Speaking for iPhone) are performing the transcription of an audio sample on the server and returning the transcript to the client mobile phone. That has to be done because of the limitations of the hardware on a mobile phone. This will improve in the future.

In analogy to the commercial apps for the development of OpenSource standalone Software on Linux it might be good to have an OpenSource-Webinterface or an Android app to collect Audio Samples for improving the user independent Speech Recognition Profiles HMM for Speech Recognition of large vocabulary and different languages.

Users will get the speech recognition result on the freerunner or any other linux box,

The can correct the speech recognition result (transcript) and submit the correction back to the server

By the Audio-File-Submit and Text-File-Return of a server based speech recognition the Open Source Speech Recognition on Linux can be improved.

Most advanced candidate seems to Simon Listens with server based backend (Tested on Ubuntu).

Google Streaming Remote Engine - look over the chrome source. As you can see in the long introductory comment from the sourcecode from the link, the javascript api work includes a separate streaming feature where there is a full-duplex Http connection. This allows a long running connection for chunking audio data on the submit side. No longer do you have to segment audio clips longer than 15 seconds as mentioned earlier. While there is more data, you just use the Http connection to write another chunk on the upload channel. The download stream returns the json containing hypotheses and utterance elements.

Views

Personal tools

NOTE: The shell script mentioned below can be used on any Linux-Operating System with some software requirements, because the speech recognition is not performed on the local machine.

Because the performance of your Freerunner is too poor for voice recognition, the Google Voice API can be used to convert a recorded Audio file into a text file. Be aware that the audio file will be transmitted to Google and the recognition is not performed on your FR. This implies, that you need to have Internet access on your freerunner FR to submit the audio file.

NOTE: You must be aware of the fact, that the follow script is running on your freerunner but it is not a standalone voice recognition software and so you might not want to use this tool for private audio files.

Google Voice API

For using the Google Voice API and the script you need to have the following package installed on your freerunner:

Install the packages from the repositories of the freerunner Distributions.

Script Usage

The script googlevoice.sh uses a the audio file message.wav in the directory of the script. All files are stored in the same directory, so you need write permissions for that directory.

SoX converts message.wav into message.flac

wget submits the file message.flac to the Google Voice API and writes the return message to message.ret. The language variable in the script is set to German by lang=de-de. If you want to submit a recorded file in US-English use lang=en-us instead.

SED extracts the recognized text message.ret by regular expressions and writes the text into message.txt.

Temporary files message.flac and message.ret will be deleted after the process.

NOTE: The return code of German audio files needs capitalization of nouns, because all words are return in small caps. A ispell or aspell correction of the message.txt might improve the recognized text.

Basic Script Code

The script code googlevoice.sh can be tested on any Linux machine with SoX, SED, WGET installed. Modifiy the script according to your demands and storage of your audio files

The parameter lang=de-de is indicating, that the Google Voice API is expecting a German language audio file. Replace lang=de-de by lang=en-us to submit an audio file in US-English.

Script with Language Setting and Command Line Parameter

The script googlevoicepar.sh with a command line parameter can be used if you want to use multiple input files for batch file recognition. You will call this script with the basename e.g. message0, message1,... by

As mentioned above is speech recognition will not be performed on your Freerunner, the audio file will be submitted to a remote server and the the remote server will return an XML-File with info on accuracy and the recognized text. To compare the performance with a standalone speech recognition software, you have to test that on a Linux Box (e.g. Ubuntu)

as a text environment of a Large Vocabulary Speech Recognition Julius was used with the Julius-VoxForge Accoutic Model for English from the repository on Ubuntu 10.10.

Start Simon from Commandline and test a few words with your mic and record it to an audio file.

simon

Compare results with same audio file by Google Voice API and Simon.

NOTE: These test might help to create a remote Speech Recognition Solution based on Julius and Simon for usage on a Freerunnner. Simon can run as server. Tests can be used to connect your freerunner to Simon.

Future Development

NOTE: Also other Speech recognition apps on Android (FlexT9) or on the iPhone (Dragon Natural Speaking for iPhone) are performing the transcription of an audio sample on the server and returning the transcript to the client mobile phone. That has to be done because of the limitations of the hardware on a mobile phone. This will improve in the future.

In analogy to the commercial apps for the development of OpenSource standalone Software on Linux it might be good to have an OpenSource-Webinterface or an Android app to collect Audio Samples for improving the user independent Speech Recognition Profiles HMM for Speech Recognition of large vocabulary and different languages.

Users will get the speech recognition result on the freerunner or any other linux box,

The can correct the speech recognition result (transcript) and submit the correction back to the server

By the Audio-File-Submit and Text-File-Return of a server based speech recognition the Open Source Speech Recognition on Linux can be improved.

Most advanced candidate seems to Simon Listens with server based backend (Tested on Ubuntu).

Google Streaming Remote Engine - look over the chrome source. As you can see in the long introductory comment from the sourcecode from the link, the javascript api work includes a separate streaming feature where there is a full-duplex Http connection. This allows a long running connection for chunking audio data on the submit side. No longer do you have to segment audio clips longer than 15 seconds as mentioned earlier. While there is more data, you just use the Http connection to write another chunk on the upload channel. The download stream returns the json containing hypotheses and utterance elements.