Fun with Text-to-Speech

There is no greater fun than playing around with Text-to-Speech on the AT&T Labs website.

Text-to-Speech has great advantages for aiding the communication needs of the disabled: Those without voice are given one; those who cannot see can hear text. Text-to-Speech can also be used in the everyday world of the ordinary consumer. I have visited the AT&T Labs website to have phrases like “Janna is calling!” made into .WAV files.

I then save the files and upload them to a service like Coolservice.dk so I can then easily download the files to my cellular phone. Once I have the sound files on my BlackBerry, I can then create special “ringtones” that will play a certain sound file when a particular person calls. Here’s what I hear on my BlackBerry when Janna calls me from her BlackBerry:

That phrase is spoken over and over by “Crystal” when Janna calls and it is an eerie feeling. Here are some more Text-to-Speech examples I created on the AT&T Labs site using different “voices.” The sentence each voice speaks is: “Welcome to David W. Boles’ Urban Semiotic!”

I wanted to test the system with a long and fairly unique and complex phrase to challenge the Text-to-Speech system and it worked! Listen in to the results by clicking on the following links to the sound files and see if you can understand what is being spoken from the text:

How do you feel about the “accents” applied to the speech? Are the voices stereotypical or are they appropriately representative and culturally authentic?

David,
Well there are quite a few. Major ones I can kind of imitate (mostly badly)
Cockney, Scouse, Mancunian, Scottish, Glasgow Scottish, Welsh, Brummie, West Country, generic Northern (“northern” vowels), generic Southern (“southern” vowels), Geordie…
An exercise for other readers to put towns / regions to some of those, though all UK readers will know I expect 🙂

Ah! What a list, fruey!
I wonder how and why ATT picked the accents for their Text-to-Speech feature?
It’s interesting that the only example of a sound file I provided where the speaker’s intonation goes up at the end of the sentence is the UK version.

Good morning, David. What a world this is—this electronic age in which we live. This morning as I sat in our living room Jerry asked, “Have you seen my phone, Shirley?”
I hadn’t. He punched his number into my cell phone and walked about the house, down into the garage, back upstairs, and, finally, within his closet he heard a muffled ring. Yep, there inside the suit coat he had worn yesterday was targeted phone.
A bit later, he was speaking, “Can you hear me now? How about now?” as he moved the phone from ear to ear.
My very intelligent mom died when I was 12, and should she be snapped back to life, and have been with me this morning, I’m sure she would have been startled to hear Jerry asking if I had seen his phone, and then to observe him walking about the house, pinging his own instrument. The AT&T Text to Speech program that in a flash said in a clear voice, “Welcome to Shirley Buxton’s blog” would surely have sent my mother into a confused shaking of her head.
An exciting world, indeed.
Shirley

Hi Shirley!
Ah, what a lovely insight! Technology compresses time and space and forces us to act quicker and to think faster and that may not always be the best method for memory retention or for interacting with each other.
It must have been surreal in many ways watching Jerry use technology to tag his lost memory. Perhaps one day the phones will do all the thinking for us and ping us when they lose us.
😀
I’m sorry to hear you lost your mother so early in your life. I agree the technological advancements in your life are three times of those that touched her life and the current generation will probably have technological advances that circle our achievements four times over…

Nicola —
I can’t believe you’re missing all the fun today!
We need your ear and your evaluation of the UK accent.
You can also speed up, pause and slow down the ATT Speech-to-Text engine to get some really funky things to happen and to also add some clarity if things get too hard to understand.

Hi David,
The voices didn’t sound too bad and they were understandable. I like the idea of accents. It makes things more interesting and provides variety.
One thing I’d like to have is a choice of a Spanish accented English speaker — that’d be nice! Maybe a Penelope Cruz accent. 🙂

This is a cool find. I have been using it to create ringtones as well. It’s a great way to personalize profiles.
I have personally used it on my blackberry but I have friends that have tried it with Nextel as well as LG phones on Verizon.
Hopefully it will stay free for a while.

Katha —
Let me make my question darker. There isn’t a “Southern English” accent or a “New Jersey English” accent — yet there is an “Indian English” accent.
Does that strike you as racist and stereotypical in any way providing an “Apu-like” Simpson’s Indian accent as a voicing choice?

David,
“Apu” is a comedy character and as an average comedy generally goes over the top – ‘The Simpsons’ is no exception.
Is it stereotypical? Yes, no less than any over used pre conceived notion.
Is that racist?
Well, after watching “There’s Something About Mary”, “Wedding Crashers” and “40 Year Old Virgin” if someone concludes that Americans don’t understand anything better than “gross humor” – would you call them racist?
AT & T just copied an worn out issue. Why blame the poor company?
Let there be light! 😀

Cultural stereotypes aside, there is a wide range of charming accents in international English, and some local grammatical and colloquialisms which are sufficiently prevalent to be not stereotypical but in “common usage”.
India has a fine history in English literature and English language culture and I love the accents that I hear from friends and colleagues. I really like imitating other English accents but have been accused of racism for that. I see it rather more as a linguistic challenge. Of course, it should be done in fun and never to belittle a nation by only saying stupid things in that accent, but rather to try to capture what it is that makes up a local accent and try to speak differently from usual. Sometimes accents can break down psychological barriers, I wrote about speaking “silly English” a while back.
Singapore has a lot of idiosyncratic usages as do Australia, New Zealand, etc. In fact, when I speak to English speakers from around the world I am sometimes infected by their usage and accent and reply in an accent which is a mix of my own and the one I’m hearing. I feel terrible sometimes, because I can’t stop myself doing it.
Indian English has its own habits and colloquialisms, like using present continuous too much “I am being pleased” for example. But here we’re talking just about an accent. Why Indian is singled out might be to do with the prevalence of Indians in the tech field in the US, perhaps? Note there isn’t even Canadian as a choice… (CA French, yes… don’t get me started on that).
Wasn’t it Oscar Wilde who said England and America are two nations divided by a common language?

Excellent response, fruey, and I think you’re right on all counts.
I, too, love the sound of the human voice and its regional accents and international pidgins, but — like you — I, too have found myself in trouble for the imitation as being disrespectful and inconsiderate. I always thought I was complimenting the person by doing a perfect imitation of their accent and intonation.
😀
FYI… reading your blog via RSS reveals your real first name as the author of the post… if that still matters to you.

I know you are not fond of “easy” David – neither am I. AT&T have chosen the easy path though!
On the other hand, it is possible that AT&T solely followed the trend – which was not expected from them being a leader.
Talking about accent, Indian English has various kinds of accent depending on the region; I agree with fruey, I suppose the reason of choosing South Indian accent is its significant presence in the tech field in USA.

I used many voices to turn anything into podcast and audiobooks. ATT voices are great. Although 16Khz, the recording quality was superb. I also put background music to make the listening more enjoyable. I can listen hours and hours without getting bored.

David Boles, Writes

David Boles, Roars

David Boles, Works

David Boles, Hails

David Boles, Brands

David Boles was born in Nebraska and his MFA is from Columbia University in the City of New York. He is an Author, Lyricist, Playwright, Publisher, Editor, Actor, Designer, Director, Poet, Producer, and Boodle Boy for print, radio, television, film, the web and the live stage. With more than 50 books in print, David continues to write 2MM words a year. He has authored over 25K articles and published more. Read the Prairie Voice Archive at Boles.com | Buy his books at David Boles Books Writing & Publishing | Earn the world with David Boles University | Get a script doctored at Script Professor | Touch American Sign Language mastery at Hardcore ASL.