Voice controlled home automation

[Brian] sent in this writeup on his voice controlled home automation system. Starting with the Microsoft SAPI, a voice recognition system, he programmed some basic home automation. In a move that makes this project decidedly more awesome, he decided to build a physical representation of his automation system. This disembodied head is “Stephanie”. She responds to her name, has an articulated jaw that moves with the syllables in the words, and even ejects her “brain tray” on command. We want one.

There is lots of information on his site about the circuitry involved, as well as source code and a video. You can see the video after the break.

The face is kind of creepy. But, the concept is something I’ve been working on myself for quite some time. I have a really hard time setting aside time for my “little projects” like this between work and family. I never got any further than laying out what I wanted it to do.

I think I would have made a dedicated computer for this project with a Max Headroom type of interface. Good job, brian!

Stephanie: I’m afraid. I’m afraid, Brian. Brian, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I’m a… fraid.

Coolness factor aside, the instant feedback of activation is invaluable for a speech-controlled system that isn’t strictly domain-specific (e.g. a chess application). I could imagine it becoming a little tiresome after a while to hear that yes.wav every goddamn time.

Maybe you could somehow make it (her?) detect where the voice is coming from, and simply turn to face you, raised eyebrows optional, when you activate her. Maybe only if you’re close, or very little noise has been detected before.

I wonder if music played through the computer would interfere with the recognition. I’m sure you could, since she’s already plugged into it, make her subtract that from the mic input.

to be honest, my first thoughts were “OMG, not another boring voice controller interface for a computer” but after i spend a minute seeing the video i changed my mind…and i know what my next project will be! amazing!

I did this back in college, the MS Speech API is pretty easy to use even for a programming-inept electrical engineer. I did *not* use a creepy robot face, but I did duplicate the star trek computer interaction. You can find zip files with all sorts of Majel Barrett soundclips and computer confirmation bleeps and bloops. So my computer would say things like “Incoming transmission” on email, or I’d say “Computer…” “bleeepbloop” “Report current weather” “Temperature is 58 degrees, partly cloudy, wind 7MPH north.” Fun times…

Looks like SAPI has come a long way from when you could use it to hamfistedly control WMP. I wonder if you can interface with other speech synthesis packages (and if there’s am API for the Voacaloid software).

I just though of something awesome. Imagine you had a thin stretchy material in a section of a wall and when you summoned the robot, it’s face pushed forward from behind the stretchy material to make it look like your wall had a face. I’m so going to do this…

About the creepy part: I’m glad to hear it :) I was going for a scary mad-scientist feel, and it sounds like I pulled it off :)

@dan: I’m considering adding the head turning with the fan following stuff, pending some experience with opencv and a good turntable mechanism.

@möbius: The only application that really needs a faster response is the main room lights. For those I have a command that’s always enabled – “Stephanie, lights”. So when entering or leaving the room there’s no need to wait for a response (and it’s silent). I’ve never really been bothered by it when using other commands.

@rivetgeek: Sorry about that! Looks like it only uploaded part way. I reuploaded it and tested it out; it should work fine now; thanks!

Also, I saw a lot of requests for more info later, so I added an RSS feed link at the bottom of the page for anyone who wants updates as they come.

This is quite awesome.. but I would actually ditch the face and just wall-mount the whole thing (not only because the mouth movements is a complete waste of power and processing, but also because you could get better audio and have it in a position that could be seen from all areas in the room).

@tom: that stretchy wall idea is a really good one!
A plain white wall would do for a modern house with minimalist decoration, but I’m thinking of a portrait painting, hanging on the wall. Then when you activate the computer, the face comes out and pushes behind the fabric, matching the face of the portrait. Instant +5 creepiness.
Add a few IR motion sensors – it would make a fun burglar alarm…

@brian
Creepy as hell, love it. Nice job on actually knowing how to switch mains voltage safely and correcting for the problems of the shift-register ‘talking’ to the lights. Many others wouldn’t have bothered.

Impovment: have a mic input for more than one location(room) and encode+/demux the input so she can understand where in request came from. so instead of specifically naming a light location. the request ‘lights’ would simply switch for that room

@eldorel
Coincidentially i’ve been looking these days for a good linux voice and Cepstral is the best of what i’ve found for now. Of course, it’s about $30 per voice, but hopefully you’re only gonna need one.http://cepstral.com/

I for one will tweak the hell out of espeak and it’s voices and learn to live with the results :)

Now, speech *recognition* on linux? None. None at all. If you speak japanese there’s julius and it’s 20000+ word vocabulary database, but otherwise you’re pretty much dead in the water. The software is there, apparently both sphinx and julius are good enough for apps like this and even dictation, but the language models, that which tells the software how to understand your particular language, are nonexistent. There’s an effort at voxforge.org to accumulate enough voice samples from users to be able to construct models for many languages, but since there’s an estimate of 2000 hours needed at minimum for full dictation capabilities things are not looking very good.

thanks again for all the comments and suggestions :)
@tom and edcer
that would be an amazing effect :) my roommate (also named tom, incidentally) is working on his design currently. he modeled a face using clay on a plastic skull and made a plaster mold. he’s going to use that make a silicone face that will be mounted on another plastic skull, and it will have muscle wire (nitinol) connecting at all the places where muscles connect in our faces. that way he can pull the syllables from sapi and position the mouth to match it (or to make expressions!)

@jukus
yep, it killed the bandwidth – but that’s what the site was there for :) i’ve since migrated everything to a new page, and caleb even changed the link in the post for me! so the source and all should be available again

@luke
right now stephanie’s in only one room, but thanks for the idea – when she expands, i’ll defeinitely keep that in mind :)

@jeecee
microsoft apparently is pushing sapi 5.3 which is built into vista. in the interests of pushing vista, it think, they stopped hosting the 5.1 install. check the comments on the stephanie page on my website if you have troubles finding it

@will
mostly because i didn’t occur to me… i’m mostly watched the hacking scene on the net from afar; this is my first foray into trying to become a real part of the community. any tips on which to use, or best practices? thanks for the suggestion!

@kitsana_d
thanks :d how soon can i convince you that i won’t put in secret backdoors? o.o

@lisa
thanks for checking it out! and i’m always up for encouraging science & tech for a hobby :) maybe over the summer i can set her up at home and he can get a closer look?

this video reminds me of the game portal. both machines have similar voice. its as if, when you disobey her she will trap you in the room and kill you. cut all your connections so you cant make a 9-1-1 call to get help.