Smarthome, speech recognition, voice control, Raspberry Pi and more

Step by step: Raspberry Pi offline voice recognition with SOPARE

After a round of optimization, refactoring, bug fixing and testing it is time for a new blog post. Since fundamentals have changed and due to public requests, we do a step-by-step tutorial. First of all, the good news: SOPARE 1.5 is out and was successful developed, installed and tested on Raspbian Wheezy, Jessie and Stretch. In addition, people mentioned that SOPARE works on Orange Pi and on some Ubuntu versions. Just in case you have no idea what SOPARE is let’s do a quick introduction:

SOPARE stands for SOund PAttern REcognition and is a Python project developed on and for the Raspberry Pi. The goal is to provide offline and real time audio processing for some words that must be trained upfront.

As SOPARE is able to learn sounds from training sessions SOPARE is able to identify the same sound later on even under different circumstances. This means that you can train words in any languages. Or just sounds like doorbells, knocks and whatever you want. Of course, there are limitations. However, SOPARE provides a simple plug-in architecture for further processing. Here are some real life operational areas: SOPARE runs 24/7 and controls smart home things like lights (on/off), a magic mirror (wake up, change views, …) and another installation controls a robotic arm via voice commands. The source code and even more information is available on GitHub.

You want to see SOPARE in action? Here is a 32 second video that shows the potential:

Now let us start with the hardware requirements. You need a computer. Yep, seriously. As SOPARE was developed for and on a Raspberry Pi we go with this one – even if SOPARE runs on other hardware as well. Make sure that the hardware comes with a multi core processor. This means Raspberry Pi 2 or 3. Please note: The Pi zero was not tested and could be too weak even if the „0“ comes with 2 cores. SOPARE does not run on older hardware like Raspberry Pi B or B+ due to the lack of multi-core processors. Of course, you need a power supply and a micro SD card if you go with the Raspberry Pi.

Then you need a microphone. Maybe some USB-mic. The microphone is extremely important and should fit your own requirements. For example: If you want speech recognition across a large distance (more that 1 meter) you may find out that the cheap USB-mic for 5 Euros does not do the trick. But if you plan to speak directly into the microphone the same mic could do the job just perfect. I’m using different microphones for different environments and requirements.

That’s it for the hardware. Now let’s talk about software. SOPARE should run on every Raspbian version that is out there. The latest version is Stretch. All of my Raspberry Pis are running the „lite“ version without a desktop UI. But this is up to you and you can choose whatever you prefer. There is some good information available how to download, install and configure Raspbian. I don’t cover this topic as it would get out of hand.

Now you should have a computer, a mic and the operating system installed and configured. In terms of Raspbian you already got most of the software for the further installation. Only some required libraries must be installed manually with the following commands:

I recommend to create a development directory in your home directory but this is really optional. In case you follow my recommendation execute the following commands:

cd
mkdir dev
cd dev

You are now ready to install SOPARE from GitHub:

git clone https://github.com/bishoph/sopare.git

Voilá. To really be ready and to follow the complete instructions we need two more directories:

cd sopare
mkdir tokens
mkdir samples

You successful installed SOPARE. Congratulations. We can fire up some tests to find out if all requirements are met and if the microphone is configured and used correctly. Start SOPARE and the audio test with the following commands:

python sopare.py -u
python test/test_audio.py

Let us assume that everything went well and you got no errors. In that case you see something like this:

Great! You can now edit the configuration and change the file accordingly to the recommendations:

nano config/default.ini

As soon as you saved the config you are ready to do a first training round. Let’s train the word „test“. This is easy as eating cake:

./sopare.py -v -t test

Start saying the word „test“ shortly after the line

INFO:sopare.recorder:start endless recording

appears on the screen. You should see lots of lines rush over your monitor. This is good as SOPARE logs some debug information. If the lines are rushing before you said something SOPARE started the training because something triggered the THRESHOLD. In that case I recommend to delete the trained file(s) and start the training again, maybe with a higher THRESHOLD.
Here is the command to delete the files and the dictionary and start again:

rm dict/*.raw
./sopare.py -d "*"

You can repeat the training round a few times. Normally 3 times is enough to get first results.
After the training SOPARE must create an internal dictionary from the training:

./sopare.py -c

Finally we reached the end of the step-by-step tutorial. You may want to check if your trained words are recognized, right? Here we go. Start SOPARE in endless loop mode and say „test“:

./sopare.py -l

Depending on your mic, your environment, the count of test rounds and lots of other things you should see that SOPARE is able to recognize the word test as it appears on the screen in square brackets.

['test']

Amazing. You can now fine tune, train more or different words or write your custom plugin. See the other available content for more information. Leave me a comment and tell me about your experience and your achievement. The video video tutorial for this post:

Post navigation

Thank you for the great information! Unfortunately, I am not sure how to put it all together.

I was able to install Sopare, train it, etc. I even saw the comment on how to apply it to gpio, but I am trying to control multiple functions with commands. What I don’t understand is, how do you use the readable output to trigger a section of code? I don’t understand what rawbuf or the 3 arguments are.

When I look at the robot arm code, for example, I don’t see how i could apply this to gpio / motor controllers so that the computer takes the voice command it hears through a microphone (ie., ‚test‘ or whatever the word is and uses) in the code? How / what line does the code „take in“ the voice input information and use it to trigger something? I was expecting something like #this is where the voice command comes from and #this is where the voice command goes.

Hi. If you want to control something via GPIO then you have to write a plugin with some GPIO code. Your plugin will be called whenever SOPARE recognized a word. Inside the plugin you check for the conditions of the trained words and executes the GPIO code. There is a very simple GPIO code example available in the comments of the „Sopare basic usage. Voice controlling a magic mirror“ blog post. Hope this helps 🙂

Yes, I was happy to see that post, but the problem is not knowing how to control via GPIO, the problem is, I don’t know how to link the voice command to that code. I see to add the ‚readable_results“ piece, but nothing is being piped into that variable as far as I can tell. In other words, when I say „test“ or „forward“ or whatever, this doesn’t seem to link to any line of code to make it activate the GPIO as needed.

Here is my code: (first I define the GPIO functions, then I added the sopare.py code with modifications to use those functions. I commented out all the things that didn’t apply, at least as I saw it.)

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an „AS IS“ BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
„““

PS. everything is properly indented, pasting here seemed to take out formatting. also the „readable_results“ code is at the end for the GPIO control, but again, it doesn’t seem to do anything. the voice commands all work though, they just don’t do anything.

SOPARE calls your plugin in the plugin directory and another sub directory with the name „__init__.py“

1) cd sopare/plugins
2) Create a custom directory in the plugin directory (e.g.: mkdir my_gpio_custom_plugin)
3) Create a python file with the name „__init__.py“ (e.g. you copy the print plugin)
4) Create your custom code and the conditions in the run function for your trained identifiers
5) Start SOPARE
6) SOPARE now calls automatically the function „def run(readable_results, data, rawbuf):“ whenever an identifier is recognized and the recognized values are in the array „readable_results“
7) Your custom code and your conditions are executed which means you control your robot

THANK YOU! I was nervously going through your instructions. I have been trying to do voice control for 2 weeks (I’m new to robotics and coding) and every method I tried has been failing. I knew when I came across your tutorial it was well written and well thought out.

After your notes above, the robot did it! You probably can’t image how happy I am about this. I literally just threw my arms up and cheered when the robot moved! 🙂

The recommendation for your environment is a sample rate of „44100“ but you are using a sample rate of „48000“. Please change the sample rate in the config file according to the recommendation and the warning should vanish.

Yes, indeed. I have tested a lot of microphones and the ones that work flawlessly are the more expensive ones. But I also using a 5€ mic that works as well but the distance is limited to around 0,5m.

Now to your problem: First of all, you only get warnings and not an error. This is important. You should be able to run SOPARE and still get some results. Not saying best results but maybe decent results. What you can do is test different sample_rates. Test the following sample_rates one by one and see if the warnings disappear:

Simply copy the „print“ directory which can be found in the „plugins“ directory to create your own custom plugin. The plugins are called automatically when a sound is recognized and you get the „readable_results“ as array in the „run“ method. Here is the code for my robotic arm example:

The issue is that there seems to be no sound device/mic: „ALSA lib confmisc.c:767:(parse_card) cannot find card ‚0‘“
I can only guess but the steps are something like:

* connect a mic
* if connected and the same error occurs edit alsa.conf to make use of the mic
* could be another issue related to your environment. Search internet for „alsa cannot find card 0“ and see if you find a solution
* …
* clean your alsa.conf file to avoid wild messages

Hi. There is no SOPARE error. The output comes from the ALSA system respective from pyaudio. As I don’t know anything about your environment so I can only recommend to clean your alsa.conf file to get rid of those warnings. I found a thread with the same message and the recommendation is also to clean your alsa config:

As SOPARE is written in Python and uses pyaudio as bride to the hardware I can only guess that you compiled something manually or you have installed some incompatible (lib-)versions. The error can be nearly anywhere in the system which means this goes beyond anything we could solve here with normal efforts 😉

My recommendation is to set up a fresh and lean Raspbian system without X and install only the most necessary dependencies and try again.

Sorry, but a segfault is something where I can’t help as it is too deep in the system ;(

Anyway. The threshold seems to be very high so likely SOPARE never gets something useful to analyse. The threshold defines a peak volume when the mic input is passed to the analysis. In all of my working environments this value is below 1000 … just to give you an indication.

ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
testing different SAMPLE_RATEs … this may take a while!

You can start SOPARE as a service. There are plenty of examples how to add a Python script as a service. I can add my own start/stop script to GitHub and cover this topic in combination with the plugins in the next video. Stay tuned – I should have time in the next few weeks to do the next video tutorial.

The issue „No Default Input Device Available“ means that the device is busy, not connected or unavailable. It could be that an instance of SOPARE is (still) running in the background. You can find out by executing the command:

Installed perfectly on Raspi 3B running Raspbian Strech. However, the „WARNING:sopare.recorder:stream read error [Errno -9988] Stream closed“ is excessive. I simply do not use the verbose parameter. The trick to using this is in the learning. But I give it 5-stars.

I want to mention that I get the error „-9988“ with one microphone (don’t appear with other microphones) so this issue is definitely hardware depended. Even it is annoying, the message is necessary because the issue effects the precision slightly. But as you said correctly it can be suppressed.

Hi, seems you messed up the plugin directory. The error highlights that a (sub) directory does not contain the necessary __init__.py file as the module „__init__ (you should avoid this as directory name!) is not available. Learn more: https://docs.python.org/3/tutorial/modules.html#packages

And you should clean up your alsa.conf as most of the output comes from ALSA!

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an „AS IS“ BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
„““

# Use given number as results to assembly result
# 0 for all predictions
MAX_WORD_START_RESULTS = 2
MAX_TOP_RESULTS = 5

# Enable or disable strict length check for words
STRICT_LENGTH_CHECK = true
# Value to soften the strict length check a bit to still
# get quite precise results but to be less strict
STRICT_LENGTH_UNDERMINING = 2

Hi. Speech independent in terms of sound recognition: yes. There is no real limit but too many commands will lead to lag/delay as the current word identification mechanism needs an optimization/overhaul.

Can we use Sopare to make short sound recognition?
I am looking for a system that allows to differentiate the sounds produced by a drumstick strike on different surfaces. About a dozen of different sounds.
The typical duration of theses sound is about a tenth of a second.
Any advice will be welcome.

Hi. Not sure and never tried it but to differentiate extremely short sounds is difficult. Try to use the visualization feature from SOPARE and check if the sound waves and the frequencies differ. Also this gives you an indication about length and frequency ranges. Furthermore I can imagine that you need a good microphone and some nearly noise free environment and you need to make sure that the training and sounds are made under the same conditions. Finally: with the right config it could work but you have to test it out – I can’t provide any guaranty 😉

hello, i am nearly completing my project, i only need to detect my hand clap, dont care about the others, but i notice that there are some sound pattern that sopare recognize as „a hand clap“ but surely it is not a clap, for example a sound of a creaking door, keys hitting my desk, etc.. I also tried to see the minimum left and right distance using the verbose parameter in sopare, but the thing is they are practically similar (the difference is only 0.1 or less) and i found no way to differentiate these sound patterns with the actual handclap. I have also tried modifying the SIMILARITY value combination but still no luck

is there any other way to overcome this? btw, i can only use CHUNK=1000 and CHUNKS=4000 otherwise after some time the mic will stop working (input overflow error will appear)

Hand clap is a short sound without lots of special frequencies involved so my guess is that the hand clap is indeed very similar to other short sounds that only move air. My advice is to review the differences by using the plot function of SOPARE and try to adjust as far as possible. My personal guess is that the CHUNK size of 1024 should play well with a high sample rate. If you are using a lower sample rate than a higher CHUNK size should be better due to the included information for the analysis.

Thank you for this wonderful project! I am having issues with recognizing two words. I can train each word individually, but when attempting to use your example to only execute a plugin when two words are spoken I am not getting the correct response. How can I adjust the configuration to accept more than one word at a time?

First at all congratulations for your work with Sopare I have been doing some research I have found your work the most convenient around.

I’m planning to import Pygame to play the sound and also I’m using ReSpeaker Pi Hat ( http://wiki.seeedstudio.com/ReSpeaker_2_Mics_Pi_HAT) for sound and microphone managing, I have setup ReSpeaker and ALSA recognizes it well microphone and sound on my Raspberry Pi 3. Do you think my ecosystem is compatible with Sopare?

I have no experience with the ReSpeaker Pi Hat but as it depends on ALSA it should work. Using Pygame with SOPARE could be tricky depending on the modules and functions you want to use. But you can get it to work for sure 😉

I talked about the overall process roughly here and here. As you can configure certain types the answer depends a lot on the individual configuration. But from a high level perspective the features derive from the frequency spectrum and from the domain data itself. But you can also mix in stuff like zero crossing. The classification (I would not call it so) is a more or less a distance calculation for all used and configured features.

Got it working but after ending the endless loop by crtl-z I get the + Stopped message and can not start Sopare again. Instead get the message IOerror: No default input device available. Probably the mic is not released or something

Any idea what I’m doing wrong here? DO I need to stop Sopare some other way?

Also, If you would want to have Sopare running continously, how to make sure it starts at boot?

One other question: does Sopare do a lot of writes to SD cards ? Reason I ask is because I have domoticz running on my Pi and the Pi in turn is using an SD card. Known issues with SD is that lots of writes have a negative effect on the lifespan of the card.

SOPARE writes when you train and compile the dict or have the -w option enabled. And of course if you redirect the logging to a file. Other than that SOPARE does not write at all. I have several (nolt only SOPARE) instances running 24/7 since years while using tempfs for some heavy write directories (like /var/log or /tmp and whathaveyou) and never had any sd card corruptions.

My tip: Just don’t by the cheapest sd cards on the market and make sure to have enough space on the card for the OS to mark bad blocks and potentially for wear leveling.

Hi,
After unavailing with few voice engines,Sopare was the one i got successfully working. Thank you for a user-friendly voice recognition software. I was able to successfully control GPIO’s with it. Now wanted to know if it can be combined with Google Assistant on raspberry pi to control GPIO pins.

In terms of Google Assistant and SOPARE. As SOPARE and the Google Assistant try to read from the mic you can try to generate a wav file to the Google Assistant. Not sure if this is possible but I read the requirement/feature request somewhere. Other than that there are possibilities but not without rewriting large chunks of code. Hope that helps.

Hi,
I want to input sound to SoPaRe via bluetooth instead of a usb microphone. How do you suggest I go about with that?
So far I have installed and trained SoPaRe for a few words and am achieving decent accuracy.
I will require to do the bluetooth section for my next prototype.Could you please give me a few guidelines on that?

If you mean a Bluetooth mic than this could work if you get it to work with the ALSA stack (PulseAudio or Bluetooth Audio ALSA Backend). If you mean your own Bluetooth stack with all the device stuff, pairing and the communication than I can’t give you suggestions or advice as this is beyond the scope and way to complicated.

Without further information help is a bit tricky. I can only guess that your plugin folder structure does not include the necessary __init__.py file or that an import does not point to the correct file…

I am trying to get Sopare running but I am stuck at getting my threshold.
If I run test_audio.py, it always says THRESHOLD = 0. If I set this in the config file, it instantly starts listening because it’s 0. If I put it to 1, it doesn’t hear something.

I have a usb webcam attached to the Raspberry Pi arecord -l returns this:

We plan to create a voice operated wheelchair.So if I use Sopare based voice recognition will I be able to react to only the voices of trained users.Or will be reactive to every user who speaks the command.(Speaker based or just command based)

As the zero is a multi core device it could work already. I can’t say for sure how good as this depends on dict size, threshold and some other things. Give it a try. If the load is too high and the recognition is delayed than I’m sorry but I can’t do much right now. In the future we might see performance optimizations and a better Pi zero support…;)

Hello! I recently used sopare to control an Arduino powered robot however some of the time the program will stop and not start again, I believe that this is due to one of the cores having too much to do and stopping, is it possible to transfer the work across to two cores? Is there anything else you recommend?

Also doubt it was my plugin as I have the same issue without it, and I know it is one core out of the 4 because it always says it is under 25% stress when it stops.

Without further information I can only guess. Let us try to shed some light into this. The following issues are responsible for a potential high load:

– amount of trained words
– sample rate
– used threshold
– similarities and other values in regard of comparing

If you encounter high load you can try to remove some trained words from the dict, lower the sample rate or use a higher threshold.
Other than that, increase the precision and filter out false positives before such words are compared expensively.

Hi, I have since tried to lower the dict word amount to 16 for 4 words; forwards, stop, left, right. This does not help the issue, I have also lowered the sample rate to 5 and have the issue. Finally I adjusted the threshold so no words other than the trained ones are herd, and the issue is the same.

More information is that after the crash were nothing happens and I have to restart sopare, the % of used power from the CPU is 3 when resting and 18 at maximum, then when it crashes it reaches 25 to 26 and stays there until I restart.

I re-downloaded sopare and I have the issue regardless of settings and plugins

Hi. My robotic arm is controlled by 13 words without any issue. Also a sample rate of 5 makes no sense at all. You need to figure out a sample rate that fits best to your hardware environment. The threshold adjustment seems to be fine.

However, if you say crash than you get an error message we can work on. File a bug report, tell us how to reproduce and attach stuff that matters like OS environment, used versions, settings, config, output, error messages, stack traces and what have you to make sure we can fix the issue.

Dear Bishop,
first of all, thankyou for Sopare. Great work and very nice technical explanation of your system.

I would liketo to use Sopare into my simple project (activate some outputs via voice) on Rpi 3. I modified the simple rate and the sound threshold as suggested by the test program and I prepared the dictionary (6 words each repeated 3 times). When ./sopare -l starts, I am in the same situation described by BlackeG. The software stops to recognize the voice randomly; it seems to be freezed after some words.

Top indicated that there are plenty of resources left. A 4 core CPU can work through a load of 4 and still run smoothly. I would check if the THRESHOLD is too low. There is a good change that your environment encounters some background noise, even in frequencies you don’t hear. This means SOPARE checks constantly against a noisy environment. In fact, the only time when the analysis starts is when the timeouts hit. And this can be seen as kind of hiccup.

tanks for replay.
Ok I modified the theshold value to 2000. The system has become more ‚dull‘, but the behavior does not change.
I added into the config file LOGLEVEL = WARNING and this is the result when the freezing happens:
WARNING:sopare.recorder:stream read error [Errno -9981] Input overflowed
WARNING:sopare.recorder:stream read error [Errno -9988] Stream closed
WARNING:sopare.recorder:stream read error [Errno -9988] Stream closed
WARNING:sopare.recorder:stream read error [Errno -9988] Stream closed
…
and so on until I press ctrl^C

If the stream is closed SOPARE don’t get further audio data from pyAudio and stops/hangs/waits until more data is available.

There is plenty of advice available for the -9981 error, mostly to use a different sample rate or a different chunk size. Just do some searching. As this issue is pyAudio related (and depends on your hardware/environment/config) I can’t do much ;(

Never tried and I really don’t know as the development platform and the target platform is Raspberry Pi in combination with a Linux OS like Raspbian. If you are able to solve the dependencies it could work…