Speech Synthesizer Could ‘Resurrect’ Dead Singers

Researchers are re-creating the voice of late singer Hitoshi Ueki.

In a few years, you could be listening to an album of new songs featuring a duet between Elvis and Kurt Cobain. No, the two never cut a record together, but engineers and computer programmers are getting closer to being able to “resurrect” any singer’s voice for use in synthesized songs.

Yamaha’s been developing voice synthesizers for years — think Mac’s text-to-speech meets AutoTune — under the brand name Vocaloid. But to build a Vocaloid “voice library,” a singer typically had to sing every possible syllable, one at a time, in the target language. A computer later would synthesize the fragments into songs.

But now the Vocaloid team has announced that it has succeeded in building a library based on the voice of someone who couldn’t participate in the painstaking process: Hitoshi Ueki, a popular Japanese vocalist who died in 2007. The initial results were revealed on a Japanese video-streaming site earlier this year.

“As far as I know, many viewers were satisfied with the result, and so am I,” said Yamaha researcher Hideki Kenmochi in an e-mail to Wired.com. “It really sounds like him, because the creator [the programmer in charge of the voice library] did a good job.”

If perfected, the technology could result in some very uncanny entertainment, with singers, actors and others whose voices have been extensively recorded seeming to speak from beyond the grave. The “resurrected” voice could be employed anywhere computerized speech is heard, from automated customer service to GPS devices (though Yamaha’s mum on where its proof-of-concept technology will end up).

Kenmochi and his team started their ongoing research on Ueki-loid, as the software’s informally called, last year. They built a computer that could “listen” to isolated vocal tracks from several songs by Ueki and pick out the individual syllables. From there, it will be relatively simple to use the library to build new tracks.

The technology isn’t perfect. Listening to a song created by an English-language Vocaloid, it’s often clear that the voice was made by a computer — but there are moments when it’s possible to forget. This near-perfection is known as the “uncanny valley” in English, and “the valley of death” in Japanese, according to Jordi Bonada Sanjaume, part of the music technology team that helped develop the original Vocaloid, at Pompeu Fabra University in Barcelona, Spain.

“When you pretend the synthesis sounds like a real person, any small artifact or unnatural subtle sound will make the whole listening experience frustrating, emphasizing that it sounds synthetic,” Sanjaume said in an e-mail to Wired.com. “Otherwise, if you ‘sell’ it as a synthesizer, all those small artifacts or unnatural sounds can be at some point totally ignored during the listening experience, or even wanted and pleasing.”

Kenmochi agreed. “Especially in Japan, Vocaloid is not regarded as a substitute for human singing, but a kind of new musical instrument,” he said.

The software can’t mimic a singer’s delivery yet (think whispers, screams or grunts), but Kenmochi told Wired.com that his team is now studying how to tackle that particular problem. They presented initial results last year, “but it will take some years to put into practical use,” he said.

Since the English language has many more possible sound combinations than Japanese, it may take longer before “Elvis-loid” is available to the public, but Kenmochi said it will certainly be possible.

The question remains, however, whether that would be desirable. Almost as soon as computers gained the capability to mix and mash up footage, Dirt Devil licensed Fred Astaire clips in order to make him dance with a vacuum cleaner, an advertisement some called one of the worst Super Bowl ads of all time. It’s probably a given that if this technology were expanded commercially, someone would get John Lennon’s voice to endorse cameras, ice cream or Huggies.

There’s also the creep factor.

“I wonder if some people might feel that the singer’s spirit has not been resurrected, but only her/his voice, and that they are listening to some kind of zombie,” Bonada said. “It may be very natural-sounding, but as creepy as a humanlike android might be.”

For the time being, those questions are academic: No other singers have been “brought back” via the Vocaloid technology, and the entire Ueki-loid library won’t be released to the public. That said, Kenmochi’s group plans to release an album as a showcase of their technology, starring Ueki-loid, sometime in the future.