The Mac has a really great text-so-speech (TTS) engine built right in, but at first glance it’s only available at Apple’s whim in specific contexts — e.g. via a menu command in TextEdit, or system-wide through the accessibility settings. Seems grim, but we’re in luck — Apple, in their infinite generosity, have given us a command line program called “say”, which lets us invoke the TTS engine through the terminal. It’s super simple to use, just type the command and then the text you want, e.g. say cosmic manifold.

So that’s great, now what if we wanted to make a Processing sketch talk to us? In Java, as in most languages, there are ways to send commands to the terminal programmatically. By calling Runtime.getRuntime().exec("some command");we can run any code we want on the terminal from within Processing. So to invoke the TTS engine from a Processing sketch, we can just create the say ... command line instruction in a string object, pass that into the runtime execution thing, which in turn handles the TTS conversion.

I’ve put together a small Processing class that makes it easy to add speech to your Processing sketches. It only works on Mac OS, won’t work in a web applet, and has only been tested in Mac OS 10.6. (I think the list of voices has changed since 10.5.)

Note that the since the class is quite simple and really just wraps up a few functions. I’ve set it up for static access, which means that you should never need to instantiate the class by calling something like TextToSpeech tts = new TextToSpeech() — and in fact that would be a Bad Idea. Instead, you can access the methods any time without any prior instantiation using static style syntax, e.g. TextToSpeech.say("cosmic manifold");.

Attachments

Hey, works great for me! One question: Is there a function to stop the speech? When I quit my Application, my Mac is still speaking. :) And another question: How do I can check if the speaking has finished. I have an array with several Strings. I would like to automatically switch to the next element and "say" it after he finished saying the current sentence.

Thanks!

January 21 2011 at 4 AM

Eric Mika:

Kamil, I’m glad it worked for you.

The control issues are tricky, because you’re dealing with what amounts to an external program (Apple’s TTS engine) to which we have relatively crude access through the say command.

Everything you can do with the command is documented, enter man say in a terminal window to see Apple’s docs. I don’t see any commands related to timing information. (That’s not to say that they don’t exist! For example, the speed parameter I use in the processing example isn’t listed in say’s man pages.)

Given that, I can think of four reasonable approaches to your problem:

Just concatenate your array into a single string before sending it to say.

Use a native Java speech synthesizer (like FreeTTS which will give you more control. Quality is mediocre compared to Apple’s engine.

Get clever with the say command. The docs say that it can write sound files to disk instead of speaking them. I’m not sure how fast this happens, but it’s possible that you could “render” your speech to disk and then re-open it in Processing with the Minim library. (This would give you duration info, and the option to start / stop playback arbitrarily.)

Use an industrial-strength API to Apple’s TTS engine. They call it the Speech Synthesis Manager. Getting access to this in a sane way from Processing is probably going to be more grief than writing the app you want in C++ or Objective C. However, these docs could be a great way to find hidden functionality in the say command. (Let me know if you find any!)

Update: Here’s a bonus 5th idea. When you execute say in Java, you get a unique Process object back. This object has a destroy() method which effectively silences say. Since each execution of say runs in its own process, you could keep track of them in your Processing app and kill them at your discretion. This would require a slight reworking of the code above, from a static approach to a more traditional instance-based approach that would return a Process object for every invocation of say. This is probably a better bet than any of the other options above if the only extra thing you need the code to do is stop speaking on command.

Thanks a lot for your answer! I am not an absolute beginner but I have no idea how to keep track of those process objects. But so far I will try out FreeTTS which seems to be more easy to handle. (for me) :) However I will also try to learn something more concerning your other ideas. Especially Nr. 5.

I am quite sure I'll annoy you again with a lot of questions, problems... soon. :)