Duolingo experimented with the use of real human voices and concluded that people liked TTS more.

I figured I would share this information since people (including a course contributor) asked me for a source of this information in the recent Irish thread. I figure the topic is somewhat relevant now, seeing as how the Irish course is going to use a real human voice and not TTS software like all of the other courses currently do.

Siri doesn't understand Portuguese but, my BlackBerry Z30 does! Both Brazilian and European. I'll stay with BlackBerry thank you! That and them just having bought a company specializing in securing voice calls on top the security features already built into BlackBerry make me so happy I'm not an Apple sheep!

Check out Speechling. You get a free language coach, to help you with your pronunciation, and they use native speakers for both the sentences and coaches. You get on your pronunciation feedback in a day or two. It's pretty awesome. I think they have it for like Spanish, French, English, and Chinese.

It kind of makes sense, really. The TTS, by not being particularly expressive, should be absolutely predictable in how it voices things. It probably also adapts better to being played slowly. Those things together would probably make people FEEL like they're learning better when using the TTS.

I think sometimes when people criticize some of Duolingo's shortcomings as a language learning tool, they forget that the absolute #1 most important trait a tool can have is: People will use it!!!

We could have a whole extended conversation here about the ways I feel that Duolingo is decidedly sub-optimal in terms of pure language teaching strategy. As in, if your true goal is to learn a language as well and as efficiently as possible, and you're going to continue working on it no matter what, Duolingo is probably not the optimal tool.

Duolingo's strong point is it's gamification for keeping people motivated. To keep people motivated and trying, they sometimes teach things in a way that's most likely not optimal.

Example: It's probably not actually optimal to teach words in themed groups. If someone teaches you say... son and brother at the same time, you're probably more likely to tend to get those two words mixed up than you would be if you were taught son among a random selection of other words, and then brother was added once you had a pretty good grasp on that.

When you're approaching it from the "motivation is important" side, though, people want to FEEL like they're learning. And when you learn all the members of the family at once, you have a good "Now I know how to talk about families!" feeling that helps make you want to charge into the next lesson.

AFAIK, it's not at all unusual for research related to learning to reveal that what makes people feel most like they're making progress is not the same thing as what actually generates the most progress.

AFAIK, it's not at all unusual for research related to learning to reveal that what makes people feel most like they're making progress is not the same thing as what actually generates the most progress.

That's true, but Duolingo has data about people's behaviour not about their "feelings". It rarely does any public surveys on what people "like". So regardless of what someone feels, if the progress is affected by a TTS engine that will be apparent. Unless of course someone deliberately fails or passes more lessons or cheats.

The measured user behavior is that more people quit when human voices were used.

What I'm trying to say is that, even if you start from the assumption that human voices are superior for teaching when compared to TTS, that doesn't do you any good if the users don't stick around and, well, use them.

I was proposing that it's entirely possible that people FEEL like they're learning more with TTS (regardless of what the objective truth is), so they're more likely to stick around.

Edit: As far as I can tell from everything I've ever seen the Duolingo staff say, their #1 priority when looking at changes is that people continue logging in and trying to learn. Obviously they would prefer to help people learn faster and more effectively whenever possible, but the absolute most important thing for the overall effectiveness of Duolingo is that people keep showing up and making the effort.

I was proposing that it's entirely possible that people FEEL like they're learning more with TTS (regardless of what the objective truth is), so they're more likely to stick around.

Yes, that is likely. I think having a real voice is potentially problematic. Think of it this way, if you listen to a robotic sentence and get the transcription wrong. You can always blame the damned robotic voice for your failures. Now if you consistently fail while listening to an authentic voice coming from a teacher in real life (at a classroom) or Duolingo, and others pass. You can therefore only blame it on yourself, or the teaching methods.

Perhaps that is why people seem to have difficulties with Rosetta stone. You look at the quality of the product, the quality of the sounds, the pictures, the games, interaction and all that. Then you look at your progress for half a year, and you still don't learn. At that stage there are only two people to blame, yourself, or the software engineer. :)

P.S. Duolingo has actually commisioned external studies to ascertain whether people actually learn effectively. That in addition to the test center seems to show that Duolingo's goal is to ensure people learn effectively, and can provide evidence that they do (and also to translate stuff obviously).

I didn't mean to imply that Duolingo doesn't care if people learn effectively. I think they just figure that the #1 factor in getting people to learn is making sure they don't quit.

It's kind of logical, really. A technique that would be the most effective thing in the world if motivation weren't an issue at all (like you're studying with a gun to your head, so you're definitely not quitting) could be so unpleasant that no one who had the choice would ever stick around long enough to learn anything from it. An awesome teaching tool is useless if everyone hates it so much that they won't use it.

Luis only mentions testing it for English and Spanish there, not for every language. It doesn't sound like he said people prefer TTS in every case, just that they did for those two languages that they tested. And actually the Spanish TTS sounds pretty good to me, so it doesn't seem crazy that it could at least compete with a human voice.

He also mentions that the other deciding factor was the cost, which he said was greater for professional human voices than for TTSes. That's apparently why with Irish they are only going to have audio for a few sentences per word taught instead of for every sentence.

Luis only mentions testing it for English and Spanish there, not for every language.

I know. I never said it was tested on all of the courses nor even implied that it was. I couldn't even come close to fitting "Spanish and English" in the title with all of the relevant information still present. It couldn't have been tested on all of the current courses anyway, as many of them weren't around when the testing was done.

It doesn't sound like he said people prefer TTS in every case, just that they did for those two languages that they tested.

Obviously, as that is the only data that can be concluded. I am not disputing the results, I am simply letting people know that they exist. For me personally though, I prefer human voices by far.

He also mentions that the other deciding factor was the cost, which he said was greater for professional human voices than for TTSes. That's apparently why with Irish they are only going to have audio for a few sentences per word taught instead of for every sentence.

He never says that the cost was a deciding factor for them choosing TTS over real humans, he only states (due to another user mentioning the cost) that paying a native to record everything would have cost more.

Deleted User: Tens of housands of dollars??? I would think you could have paid a native speaker to record everything for much, much less.

Luis von Ahn: Unfortunately, that's not the case.

The deciding factor is that in their test, people preferred TTS over real humans.

Luis von Ahn: (in response to another user): Hi. We've actually tried a real human voice for the first 10 lessons (for English and Spanish, with professional voices), and people liked it less! (They returned to the site less.) Based on this we decided to work on improving other aspects of the site instead.

Deleted User: Tens of housands of dollars??? I would think you could have paid a native speaker to record everything for much, much less.

Luis von Ahn: Unfortunately, that's not the case.

That's strange and does not fit what I know about the pricing of voice recording. In English $5000 is a good estimate. For Russian, it is $3000-2000. Moreover, recording learning samples is not particularly hard since no acting is required, just clear and consistent pronunciation (however, unnatural sentences may be different, and you also have to record slow pronunciation).

The major drawback is, you will not be able to add new sentences after the recording is done. For one course, you could probably do with some bargaining with the studio you chose (like paying them a bit so that they ask the talent to pronounce a few more sentences the next time they show up). For dozens of courses it is too much to handle.

Honestly, I'd be perfectly fine with volunteers submitting their pronunciation of words and sentences and having the best one(s) accepted as official. I learn German vocab on Memrise, and the majority of the courses that have audio don't have it provided by a professional voice actor, rather just a regular person with a good quality setup.

Probably not a good idea if you are going to listen a lot to the sentences, all of which are pronounced by a bunch of unrelated women and men with different timbres and accents. One uniform and calm voice provides for a much smoother experience.

Now, that's the problem: one person pronouncing 10 000 sentences is 17 hours of speech assuming you read 10 sentences a minute, without a "slow" option. It is hardly realistic to expect that there is going to be a volunteer who can reliably read an entire database. Though, I would do it if there is such option for the Russian course in the Future.

Considering all of the people who put tons of their free time into open source projects that they receive no monetary compensation for, I don't think it's a stretch at all. One could look to hardcore Wikipedia contributors as an example of people who put in tons of hours contributing to an educational platform and receiving no monetary compensation. Then of course there are people who contribute to Ubuntu, Firefox, Chromium, Duolingo's Incubator itself, etc.

As you say, you would be willing to do it even. I am sure there are far more people who would be willing than just you. It's not as if it would all have to be recorded in one sitting. Also, I'd be willing to bet you will be putting in far more hours on the Russian course than 17. I am surprised that you as a contributor to such a time consuming endeavor seem to underestimate the amount of people who are willing to contribute to an educational platform for free.

All that said, I have no reason to believe this will be possible. It's just an idea. Also, seeing as how Duolingo has the money to pay for a voice actor for a small language like Irish (albeit without all of the sentences recorded), if they were to expand human pronunciation to other courses, one would assume that they'd have enough money for the major languages at least, like Russian.

I didn't mean that they would be pronounced by different people, rather that one person could be chosen to pronounce all the sentences among multiple choices. Lots of people spend many hours contributing to open source projects when they could be making money doing the same kind of work, so I think there would definitely be people willing to do it.

I'm a strong supporter of the idea of real human speakers for the courses, but I think that the biggest stumbling block may not be speaking talent, but rather equipment and recording quality.

When you hire a voice actor, you also probably get access to professional equipment and editing. This doesn't mean that there is no way to get good quality audio with ordinary volunteers, but having people speaking into their iPhones or such would probably not be good enough. I don't know what the best solution is.

Not that hard to get a good setup either. And yet inconsistency between takes, between processing, different mikes and different speaker's setups can be disturbing, especially if no one speaker records all the sentences. It is disturbing in real life, too: imagine your friend to have a random voice chosen from three different voices each time they utter a sentence. Surely that would distract you from getting the message: you would focus on how the voice changed first.

I really like the quality of the Russian recording. I would GREATLY prefer that over listening to a TTS.

I'm actually contemplating giving up on the TTS and telling Duolingo not to use my speakers anymore. I told it to stop using the microphone a while ago. In both cases the problem is that I have a nagging feeling that the quality isn't sufficient to make it a good use of time.

I actually think having a variety of human voices would probably be superior to having a single human voice. My understanding of what research has been done is that listening to a single speaker feels better, because you can adapt to it more readily, but hearing different speakers (with slightly different accents) provides your brain with a better opportunity to develop a general sense of what a word sounds like as opposed to keying in very tightly on what a word sounds like when spoken by a single specific speaker.

I'm not opposed to Duolingo's use of TTS, because there are legitimate reasons it makes sense for them. I just have reservations about whether it's the best possible use of my time at the moment.

Shady_arc: Your recording is very good. It would be awesome to have recordings like that replace the TTS.

I think the distracting element of multiple different voices could be mitigated if it is indicated which voice you are going to hear. Each voice actor could have their own avatar, shown beside the ‘play’ button.

You could always learn to develop software and create all these things you want. It is not likely that there will be a software that satisfies all individuals complete wants in the near future, unless it is one that is developed by the person itself, or reads minds!

It wasn't really that people voted. Duolingo is built mostly on experiments that they run. They'll quietly pick one group of users to be the experimental group, and then change the way they experience the site (change the layout, add a new feature, change the way audio is handled, etc).

Then, they compare that group with their other users to see what happens. Do these people spend more time on the site or less? Do they make more progress or less? etc

That's how they make most of their decisions about what features are worthwhile and what isn't really helpful.

Thanks. Interesting info. I don't really have a problem with the computer voice in general, the Spanish seems to be OK, but the English computer voice needs to be taught how to pronounce verb versus adjective or noun forms of homographs. (The people who live in my house play live music on the mall, You present a present, etc.) I'm working with another site right now that used live voices, with multiple different accents. Their problem is that they didn't bother to record the voices actually reading the sentences slowly, so the slow button produces a 45 record played at 33 speed effect. Way worse than the slow version of this site. Allowing multiple different voices would solve the problem of getting the same person back to record, but I bet it would cause way more complaints because it is one more thing to listen for.