Will Speech Recognition Software Mean the End of Accents?

When Siri, the Apple iOS virtual assistant, first began showing up on iPhones in late 2011, the program's cheery female voice quickly became a sensation. It wasn't just because Siri could understand its owners spoken instructions and answer questions, dial phone numbers, give a weather report, look up directions to a destination, or even find the best Thai restaurant in a neighborhood and make a reservation. As actor John Malkovich demonstrated in a series of clever Apple commercials such as this one, Siri could even seemingly make small talk, as if she — oops, it — were a real person.

Siri — whose name stands for Speech Interpretation and Recognition Interface — eventually was joined on the scene by competitor such as Microsoft's Cortana and the Android OS's Google Now app, which lacks a personality but can still understand natural-language questions from users and respond with spoken answers. There's also Alexa, the virtual assistant that comes with the Amazon Echo home appliance, which does everything from play music to switch the lights on and off, based upon voice commands. And consumers who call companies usually find these days usually find themselves talking to interactive voice response (IVR) systems with speech recognition, instead of an actual human operator.

Tell It Like It Is

But as users have discovered, there's one thing that such software has trouble doing — understanding someone who speaks English with a strong regional or foreign accent. It wasn't long after Siri's launch, for example, that Scottish iPhone owners began complaining that the virtual assistant couldn't decipher their brogues.

A YouTube user named James McDonald, for example, posted this video of Siri struggling to translate his instruction to "create a reminder." Users who spoke English with Indian and Filipino accents also complained that their phones couldn't understand them.

Since then, in fairness, virtual assistants' ability to understand accented English seems to have improved significantly. According to Apple's support website, for example, the program now understands accents and dialects in English and 15 other languages. As Dan Moren, a former MacWorld journalist who now writes for SixColors.com, noted last fall, Siri no longer links language and location as strictly as it once did, so that even if you set it so that it speaks in British-style English, it can understand someone who doesn't have that accent.

But some regional accents seem still seem to trip up speech-recognition software. As American Southerner Julia Reed noted in a recent article for Garden & Gun, when a broken arm prevented her from typing, smart phone and computer dictation apps still "steadfastly refused to understand pretty much everything I had to say."

Code Switching for a Fix

That dilemma may eventually become a thing of the past, if virtual assistants and speech-recognition software eventually help to train us to speak English in pretty much the same way. Lars Hinrichs, an associate professor of English language and linguistics at the University of Texas at Austin, cites the example of a friend who has to drop her Jamaican style of speech and "fake a U.S. accent" to get Siri to understand her instructions. Another friend, a native of India who's been living in the U.S. for 15 years, also has to alter her speech to communicate with the program.

"Some [language] features confuse Siri more than others," he explains. "Jamaican English and Indian English speakers would normally pronounce words like car or bird without an 'r'-sound. That's really hard if a language processing device is trained to American English, because in American English the 'r' would be pronounced."

Many of us already have developed a "machine voice" with a different cadence, pitch and enunciation. As Alan Black, a professor at Carnegie Mellon University's Language Technologies Institute, explained in a recent article in the Guardian, a British newspaper: "If you're standing next to somebody in an airport or at a bus stop or something, you can typically tell when they're talking to a machine rather than talking to a person."

Scholars who study the evolution of English say that such standardization, which they call dialect leveling, already has been going on for decades, driven by the influence of television and other mass media. Inside the U.S., where people move around to different parts of the country more than in the past, regional differences in how we talk are starting to fade. North Carolina State University associate professor of linguistics Robin Dodsworth, who's studied recordings of hundreds of Raleigh, N.C. residents' speech, has discovered that the unique vowel sounds commonly associated with Southern speech are now more difficult to find in Raleigh.

Speak Into the Machine

Hinrichs sees Siri and other speech recognition programs as contributing to that trend. "I would say that Siri, et al. can have the consequence of forcing individuals to speak in a standard accent," he says.

Hinrichs notes other types of electronic communication, such as video-conferencing, also are contributing to the standardization. "Individuals live their lives in ever-more complex social networks, and, as a result, are getting exposed to an ever growing number of different ways of speaking: they hear more different languages, more different foreign accents, and more different native-speaker accents of English," he explains.

"When you are working at an engineer's office in Amarillo and having a Skype conference with one person who is in New York and another who is in New Delhi, you are exposed to a similar type of pressure," he says. "You will feel inclined to speak at your most intelligible, and that typically means: in your most standard, least local-sounding, accent."

That said, Hinrichs thinks that despite the standardizing influence of technology, accents aren't going to vanish completely. One reason is that speech recognition is likely to become increasingly sophisticated about deciphering regional variations. "I also know a bunch of other people, one of whom works for Google, who spend their careers trying to teach computers to understand speech and writing using local, or nonstandard, language forms," he says.

Additionally, when speech recognition apps compel people to alter their speech, they also serve as reminders of what makes us distinctive. "Individuals and communities become more aware that their local identity is special or different from others, and they develop a stronger desire than before to preserve and perform their localness," he says. As a result, he says, he doesn't expect local dialects to go away any time soon.

Now That's Interesting

Most computerized voices on gadgets are female, because research has shown that people generally find women's voices more pleasant than male ones. One exception was in Germany, where BMW had to revamp a car navigation system in the 1990s after male drivers complained about having to get directions from a woman.