Expressive Interfaces: Voice

For my voice interface, I decided to go for a more functional route. Sometimes it is time consuming or feels cumbersome to fill out forms by typing or writing by hand, so I wanted to make an experience where a user can fill out a form quickly by speaking the answers. There’s a demo video above and the full dialog flow is below.

This week, I set up an Google AIY Voice Kit with a Raspberry Pi. While building the box and putting everything together was pretty easy, I actually ran into several issues when trying to run code. Keep reading below to see the process, some code, and a video of it working.

I created a STT interaction using the p5 speech library. I wanted to create something kind of like a storybook or matching game that kids play, except it works for anything. I ended up combining the p5 speech recognition with the Flickr API. So when you talk to into the computer's microphone, it will be listening, and the search Flickr for an image of what you said. I think this could be quite fun for parents and kids. It is a fun interaction that puts a picture to words and helps visualize something verbal.