Let Alexa Control Your Life; Guide to Voice-Enable Everything

Let’s face it, automation doesn’t feel quite as futuristic unless you can just say what you want out loud and have the machines flawlessly obey. That is totally possible now — and on the cheap. Well, cheap as far as money goes. It can be an expensive learning curve to get it all working. This will help. [Lindo St. Angel] has put together a guide to navigate voice control of hardware using Amazon’s Alexa SDK.

We previously reported that Amazon’s AI had escaped its hardware prison in the form of the Alexa Skills Kit. Yes, calling it the Alexa SDK above is wrong it’s actually the ASK but nobody knows what that acronym is while most recognize the gist of an SDK. It gives you the hooks and the documentation necessary to leverage the functionality in your own applications. The core functionality of Alexa is voice recognition. Even so, it’s still a tall hill to climb.

[Lindo] has broken down the problem into a very manageable example. The Amazon Voice Service (part of ASK) is used for voice recognition and control. Amazon’s Lambda service connects the ASK to your piece of hardware; in this case he’s using a Raspberry Pi as the server. The final step is to connect your hardware to the Pi. [Lindo] is interfacing a keypad-based home automation system with the Pi but the sky’s the limit at this point.

With all the authentication and connectivity laid bare, this is a lot more approachable. The question is no longer can you connect everything to voice control. The question becomes should you give control of everything over to one single online service?

Post navigation

Has anyone come across anything stand alone (that doesn’t require an Internet connection) that can still give reasonable reliability when it comes to translating speech into toggling pins but still be able to cope with a fairly large number of different commands and parameters?

I thunk the reason for cloud based voice recognition is the shear amount of data that your voice is crossmatched with (i.e. the whole english database, plus other languages and accents) if you have a fairly strong computer, probably you can implement your own voice recognition, or find a way to download the database.

Unfortunately the datasets are closely gaurded. These days machine learning is easy so most of the competitive advantage comes from have large high quality datasets. Strangely enough nobody wants to share. Perhaps comeone can find a large collection of transcribed text, or a large collection of people reading books?

CMU pocketsphinx can run on a raspi, in fact it’s the offiine speech recognition backend for Android. I wrote a python script ages ago to control music playback and it was pretty good (on my desktop, i didn’t have a pi then). the trick is that it’s context based, so it needs to know what words in its vocabulary go together in order to reduce errors. I had to make a bash script to give it every possible combination of “[wake word] play [song] by [artist]”

it’s encouraging to see the progress of this project go from:
“use only amazon servers and only on amazon devices to only buy amazon stuff” to
“use only amazon servers and only on amazon devices to do anything” to
“use only amazon servers on any devices to do anything”

almost like a believable plan an AI would cook up to convince humanity to become interested in it and install it on all devices…

The cheapest voice controller options are, a cheap android phone, one that can be rooted and has wifi + bluetooth so it can talk to all your IOT modules. All that for $50, nothing else comes close in terms of value for money. Look around and you may even get one that fits on your wrist. How can even the smartest hack beat that? It is a classic example of the “economies of scale.”