Another make-an-actor-say-anthing app

Recommended Posts

Lyrebird.ai is a Canadian company doing AI creation of new speech from sample recordings. They have an online demo, where you record a minimum of one minute (they guide you through sentences, so the samples have a key), it runs the samples through a neural net, and then it'll create your voice saying anything you type.

Pros: almost real-time, with a web interface.

Cons: still somewhat artificial sounding, but a lot better than previous while-you-wait examples.

... and this is just a first-gen beta demo, with a really small training set and no ability to tweak the results.

The company isn't posting anything about their technique, so I'm just guessing (from the operation and from the principals' bios) that it's NN. They seem to be interested in selling their sdk to other developers, rather than offering a service to filmmakers... but that's also just a guess.

Share this post

Link to post

Share on other sites

Other artificial speech apps -- some I've posted here -- have controls for prosody and inflection. [I'm assuming you weren't being sarcastic in your post, but actually looking for information...]

What hasn't been done ... or rather, what hasn't been published yet ... is training a neural net to generate those wrinkles automatically. It still has to be done by a human operator. But as soon as someone comes up with a training library that's properly keyed for these elements, it'll happen. The mass market is there, for digital assistants with an edge. And building the library won't be hard, as soon as someone develops a consumer app that provides a benefit for users tagging the subtext.

Share this post

Link to post

Share on other sites

Other artificial speech apps -- some I've posted here -- have controls for prosody and inflection. [i'm assuming you weren't being sarcastic in your post, but actually looking for information...]

What hasn't been done ... or rather, what hasn't been published yet ... is training a neural net to generate those wrinkles automatically. It still has to be done by a human operator. But as soon as someone comes up with a training library that's properly keyed for these elements, it'll happen. The mass market is there, for digital assistants with an edge. And building the library won't be hard, as soon as someone develops a consumer app that provides a benefit for users tagging the subtext.

I stand corrected. Thanks for the info. Very interesting stuff. A colleague of mine believes this technology will be used to recreate noisy/scratchy/unusable dialogue in post.

Share this post

Link to post

Share on other sites

A colleague of mine believes this technology will be used to recreate noisy/scratchy/unusable dialogue in post.

Add me to that list. I first proposed it about a dozen years ago, in a DV Magazine column.

Cleaning up bad production dialog is one thing. But these apps are (or soon will be) capable of making a convincing recording of anybody saying anything you want to type as an input. All you need is enough samples of their voice to use as training material.

And as I reported about a month ago, a different app can take video of someone, and make an absolutely convincing new lipsync video of them saying a new audio input. Demos already online, and of course that's also still in Beta.

"Oh brave new world, that has such [non existent] people in it!" -- Shakespeare