Since Alan Turing, the father of AI, first proposed it in a 1950 paper, the Turing Test, which measures whether a computer can fool a human into thinking it's real, has been a classic measure of whether an artificial intelligence system is successful.

And, according to a new paper released Monday that will be presented in June at the annual conference on Computer Vision and Pattern Recognition (CVPR) in Las Vegas, MIT's Computer Science and Artificial Intelligence Laboratory has now crossed a new threshold: Passing the Turing Test for sound.

Back in December, MIT researchers created an AI system that passed the Turing Test for vision by fooling humans into thinking that characters were written by humans rather than a machine.

The new work tackled the realm of sound by using a deep-learning algorithm that had been trained on videos of objects getting hit to produce sound effects. According to the release, "when shown a silent video clip of an object being hit, the algorithm can produce a sound for the hit that is realistic enough to fool human viewers."

To make the algorithm, MIT's computers used data produced from approximately 1,000 videos of roughly 46,000 sounds of objects being "hit, scraped, and prodded with a drumstick"—a library of work that is free and available for other researchers. To find patterns, the researchers presented the sounds to their deep-learning algorithm, which could pick apart characteristics like pitch, volume, and tone. The algorithm would then predict the sound of a new video by matching audio properties of frames of video to similar sounds from the collection, and then melding them together.

Frames of video from sound next to audio waves of same sound

Image: MIT's CSAIL

It produced an algorithm that could reproduce a variety of different sounds, ranging from "staccato taps of a rock to the longer waveforms of rustling ivy," as well as a range of pitches, "from thuds to clicks."

MIT researchers tested their algorithm via an online study. Participants viewed videos of "impact events," or, in other words, objects knocking into each other. They viewed two cases: With artificial sound produced by the algorithm and the real, recorded sound, to judge which was the original sound. The human subjects picked the artificial sound as the "real" sound twice as often as the actual real one.

So what does it mean? The algorithms could potentially be used to create more realistic sound effects in television or film, or to help robots better understand the audio properties of certain objects. Why are audio properties important? According to the release, "When you run your finger across the rim of a wine glass, the sound it makes reflects how much liquid is in it," said CSAIL PhD student Andrew Owens, lead author on the paper. "An algorithm that simulates such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world." This kind of information could, presumably, help inform the robot about how it should interact with its physical environment.

But not all AI experts are convinced that passing the Turing Test, in this way, is particularly relevant.

"It's a very impressive development and an unquestionably important capability, but not really meaningful to claim that it is a part of a Turing Test for sound," said Roman Yampolskiy, director of the Cybersecurity lab at the University of Louisville. "Most humans are not able to produce sound effects for movies."

Perhaps a more relevant Turing Test for sound, Yampolskiy said, would be "to produce human speech [created artificially] that can't be distinguished from that produced by real people."