Speak and Your iPhone Will Understand

As this is being written in the spring, rumors are circulating that voice control technology will be deeply integrated into iOS 5. Whether in iOS 5 or later, it seems inevitable that much of our interaction with our devices will eventually take place just by speaking.

Of course, there are already many apps that use speech recognition, and ever since the 3GS, the iPhone has had a basic voice control feature that lets you make calls or request particular songs just by speaking into your phone (see sidebar page 38). But the forthcoming developments will go much further.

Natural language understanding

We saw the best example of "natural language understanding" in February when an IBM computer called Project Watson handily defeated the two all-time most successful champions on TV's Jeopardy quiz show. Because of puns and other wordplay in Jeopardy questions, the computer had to have a lot of contextual intelligence to be able to interpret what was being said.

Siri Assistant is a "virtual personal assistant" that uses natural language understanding to perform web-based tasks for you, such as making a reservation at a restaurant or getting a taxi. Siri's technology combines speech recognition, natural language processing, and semantic Web searching. In April of 2010, Apple bought the company that developed Siri Assistant, and Siri's technology will reportedly be the basis for the deep integration of voice technology in iOS 5.0.

Regardless of what's coming up in the future, there is a bounty of App Store titles (many of them free) that offer you the convenience of speech recognition. Since this takes a lot of computing power, most of these apps require an Internet connection. Typically, the way it works is that you say something, your iPhone digitizes your speech and uploads it to a server. The server performs the recognition, executes some operation (such as a Web search or translation to a different language), and sends the result back to your device—it all happens in seconds. I think we're going to see more apps using the cloud to provide the computing horsepower necessary to do some amazing things with our iOS devices.

Speech recognition apps from Nuance Technology

Nuance Technology has long been a leader in speech recognition on desktop computers, with popular software such as Dragon Naturally Speaking for Windows and Dragon Dictation for the Mac. They also have two apps for iOS devices.

Dragon Dictation

As with the desktop software, this app lets you speak into your device and translates your speech to typed characters. You can not only dictate notes and e-mail messages, you can also dictate updates to social networking applications such as Facebook and Twitter. In addition, you can speak editing commands as you dictate, such as telling it to start a new paragraph or to capitalize the next word. You can find a very helpful list of these commands in a blog posting on our website (iphonelife.com/blog/2440/dragon-dictation-killer-ap).

Once you've dictated some text, you can output it directly into e-mail, SMS, Twitter, and Facebook without having to copy and paste.

Because digitalized speech takes a lot of bandwidth, Dragon Dictation limits you to one minute of speaking at a time. This also reduces the amount of time you have to wait for a translation. (Most people don't speak longer than a minute without pausing anyway.)

Dragon Search

Dragon Search lets you speak your search queries rather than having to type them in. It simultaneously searches a variety of sources including Google, Yahoo!, Bing, Wikipedia, YouTube, Twitter, and iTunes. After it has translated your speech, the app will also gives you a list of alternate search suggestions, in case it didn't translate your speech accurately.

Other apps using Nuance's technology

Nuance not only develops apps, they also license their speech recognition technology to other developers. A number of language translation apps use Nuance's technology to first convert speech to text before using their own technology (or Google Translate) to translate that text to another language. For example, Siri Assistant, (mentioned earlier) uses Nuance's technology to first recognize your words; then, the app applies Siri's own technology to interpret what you said.

You typically train desktop computer speech recognition software by speaking scripted sentences into it so that the software learns to accurately understand your accent and manner of speaking. Similarly, when you use a Nuance-based app on your iOS device, every time you make a correction to what it thought you said, you are training the app so that it will understand you better the next time.

And here's the interesting part: your profile for the app is associated with your device's unique identifying number, referred to as the UDID. That means that any corrections/training that you do in one Nuance-based app will transfer to the other apps. For example, if you make a correction in Dragon Search, you are training the Siri Assistant app at the same time.

Let's take a look at a few of the Nuance-based apps from other developers.

Siri Assistant

As mentioned, Siri Assistant lets you speak a command or question, and it then interprets that and assists you in some way. It combines speech recognition with natural language understanding and GPS, such that you can make a variety of requests. You can use it to reserve a table at a restaurant, book a taxi, find movies playing in the area, find nearby events, find local businesses, send yourself reminders, get the local weather forecast, check flight status, send a tweet to Twitter, perform a Web search via Bing, and more.

For example, if I simply say "nearby Chinese restaurants," Siri first translates that to text, then does a search, and then returns a list of four Chinese restaurants within a few miles of my house. Plus, it will offer the option of showing them on a map, provide a Call button so I can quickly call them, and give additional information about each establishment, including reviews.

Because I didn't specify a location, it will automatically use GPS to determine where I am. But I could also have asked for Chinese restaurants in a particular Zip Code or city, or I could have asked for a restaurant by name.

Merriam-Webster Dictionary

The paradox with dictionaries has always been: how do you look up a word if you don't know how to spell it? Now the solution is at hand—the free Merriam-Webster Dictionary app. You can look up a word simply by saying it. The app also offers synonyms and antonyms, example sentences, audio pronunciations, and more.

Dictionary.com

This app has nearly 1 million words and definitions. It doesn't require an Internet connection to search for words, but you do need a connection if you want to use the voice recognition feature. The app includes audio pronunciation, example sentences, synonyms and antonyms, nonstandard uses, word origin and history, and custom backgrounds.

SpeechTrans Ultimate

SpeechTrans lets you talk into your iPhone in one language and then hear the translation in another. The company claims 95 percent speech recognition accuracy. If you're traveling in a foreign country and can't understand what someone is trying to tell you, simply have the person speak into your iPhone or iPad and then hear the translation. Want to order something in a restaurant, but the waiter doesn't understand English? Simply speak into your phone and then play back the translation to the waiter. The Facebook Chat feature lets you input via voice and then outputs via a voice translation, enabling you to communicate with speakers of other languages around the world.

The app recognizes English, Spanish, French, German, Japanese, and Italian. It can speak back a translation in these languages, plus Russian, Korean, Chinese, Portuguese, and Polish. In addition to the purchase price of the app, you also pay for the transcriptions. The app price includes 400 transcriptions; beyond that you'll need to use in-app purchases, which range from 30 transcriptions for $0.99 to 500 for $9.99.

The developers say that the app has better speech recognition accuracy than the free Google Translate (discussed later). Also, as with other Nuance-based apps, you can speak for up to a minute, whereas Google Translate uses 15-second segments.

iTranslate

Free, app2.me/2690; $1.99 for 75 transactions through in-app purchase Like SpeechTrans, iTranslate uses Nuance's highly accurate speech recognition technology to let you speak in one language and hear the translation in another language. The app includes Conversation Mode, a neat interface for conversing with someone in another language. It recognizes six languages, but requires you to purchase transaction packages from within the app to use voice recognition. (Note: Each use of speech recognition is considered a transaction.)

Trippo Voice Translator Plus

Trippo is another translation app. Text-to-text translation is free. You can add speech recognition and speech output via in-app purchases. The app can recognize English and has speech output for 13 languages.

Price Check by Amazon

If you're out shopping, you may wonder whether you can purchase an item at a cheaper price via Amazon. You can use the voice feature in Price Check to search the site by simply saying the product's name.

Ask.com

Ask.com is an interesting app that answers your questions by searching the Web or asking experts. You can simply speak your question: What's the forecast? What's a good place to stay in Rome? How many ounces in a quart? How can I remove a wine stain? There's no SMS fee or any other charge for using this app.

Other apps that use Nuance's technology include Aisle411 (Free, app2.me/3752) which finds stuff in stores, and Taskmind (Free, app2.me/3753), a task organizer.

Other speech recognition apps

Of course, Nuance isn't the only provider of speech recognition technology; there are a wide range of apps available that use other technologies.

Google Translate

Google's translation app also offers speech recognition and voice output, but unlike the others, everything is free. Other vendors claim that Google's speech recognition isn't quite as accurate as theirs, but it worked pretty well when I tried it. When you speak into the app, it displays what it thought you said as well as a text translation into the language you've specified. If it didn't accurately recognize what you said, you can quickly tweak the text. When you're satisfied, tap on the speaker icon and it will speak out the translation. It can translate text between 57 languages. The speech recognition feature works for 15 of those. And the speech synthesis feature (in which it speaks the translated text) works for 23 of the languages.

Google Search

This is a great app for searching the Web. The English speech recognition includes American, British, Australian, Indian, and South African dialects. In addition, it recognizes speech from 15 other languages. Unlike the other apps, you don't need to tap an icon to do a voice search. You can simply bring your iOS device close to your ear, wait for the tone, and speak your query.

Bing

Bing is another top search engine with an app that offers voice recognition. Microsoft calls Bing a "decision engine" that you can use to find information, restaurants and other businesses, images, showtimes, travel deals, flight information, weather forecasts, and walking or driving directions.

Jibbigo Speech Translator

Jibbigo offers speech translators that don't require an Internet connection. This can be especially useful if you're traveling in areas where an Internet connection is unavailable or very expensive. Each app includes English and a second language (Spanish, Chinese, French, German, Korean, Tagalog, Iraqi, Japanese). Jibbigo also offer a free app (Jibbigo.Net) that lets you translate between English and three languages: German, Chinese, and Korean.

Vocabulary Trainer by Babbel

Free, search on "vocabulary trainer by babble"

Babbel offers a series of free apps that teach you 3,000 common vocabulary words Spanish, French, Italian, German, Portuguese, and Swedish. The apps teach you pronunciation by using speech recognition to evaluate the quality of your pronunciation.

AT&T Navigator

Free (only works with AT&T iPhone 4, 3GS, and 3), app2.me/3101; service costs $9.99/month or $69.99/year.

This app from AT&T has features you'd expect from a GPS app, such as voice-guided navigation with turn-by-turn directions. But it also offers speech recognition, allowing you to speak commands to get driving directions or do a local search.

Using built-in Voice Control on Your iPhone

The release of iOS 5.0 may change everything, but the current version of iOS has quite a few voice commands that you can use to control your phone.

To invoke Voice Control, simply hold down the Home button for three seconds.

To dial a number say, "Call" or "Dial" and then the person's name or number.

You can select music to listen to by speaking a command and then giving the name of the song or artist or playlist. The play commands include: "Play songs by," "Play playlist," and "Play album."

You can control music playback by speaking these commands: "Next song," "Previous song," "Shuffle," and "Pause." You can invoke the Genius feature, which finds similar music, with these commands: "Genius," "Play more like this," and "Play more songs like this."

You can find out what's playing or who the artist is by using these commands: "What's playing?" "What song is this?" "Who sings this song?" "Who is this song by?"

To find out the time of day, say "What time is it?" or "What is the time?"

You can stop a selection from playing by saying "Not that," "Wrong," or "No."

Vlingo – Voice App

Vlingo is one of the more popular voice control apps. Unlike many of the apps above, which are single-purpose, Vlingo offers a greater range of functions. The free version lets you place calls just by saying "Call Richard," search the web via various search engines by speaking your query, post Twitter and Facebook updates by speaking them, access maps, and more. In-app purchases let you send text messages and email completely via voice. The cost is $6.99 for the text message function, $6.99 for email, or $9.99 for both. Some websites say that it's the best hands-free app for texting. The app combines voice recognition with an "intent engine" that guesses what you want, for example responding with local listings if you say the name of a movie.

Monica

Monica is a fun personal assistant that not only gives you voice control but also talks back to you. You can ask Monica to access your Facebook account or read aloud your e-mail, news articles, horoscope, or Google Docs. You can also ask Monica questions, but you'll have to type them in. However, Monica will speak the answer to you. You don't need to tap any buttons to get Monica's attention. Just say "Monica," and the app responds by giving you a menu of options from which you speak your choice.

VoiceDJ

VoiceDJ lets you control your music collection through voice commands. Developers claim it has 99 percent accuracy and responds within milliseconds. It can handle music collections over 5,000 songs and recognize over 1 million English words; it has a much broader range of understanding than the built-in voice control commands (see sidebar).

Text'nDrive Pro ($9.99, app2.me/3763): reads your e-mail out loud to you and lets you respond using voice-to-text.

I like the convenience of speaking to my iPhone and iPad, but there any downsides. One is that speech recognition apps falter if there's a lot of ambient noise or if you use nonstandard vocabulary. In addition, there are simply some occasions when you can't use an app such as Dragon Dictation… for example, in class or in a meeting. This point was made to me by the developer of Writepad ($3.99, iPhone: app2.me/3764; $9.99, iPad: app2.me/3765), which is among the best handwriting-recognition apps.