Caught on tape: cameras turn video into sound

Now even the cameras have ears. Engineers at MIT have discovered a way to listen in on conversations simply by filming objects near someone talking loudly and measuring the tiny vibrations that sound causes in those objects. The technique uses high frame rate cameras to film an object. By tracking the position of an object down to 1/1000th of a pixel over time, the researchers were able to recover the vibration pattern in the material, and the sound that caused it.

Spies do already have laser microphones that allow them to listen in on far away conversations. But those are generally limited to listening to conversations behind clear panes of glass. The new technique can record sound from any object the camera can see, although some materials deliver better sound signatures than others.

Seeing is hearing

"We were able to recover intelligible speech from maybe 15 feet away, from a bag of chips behind soundproof glass," Abe Davis, who led the research. A recitation of "Mary Had A Little Lamb" is clearly audible in the visual recording (see video, above).

The same high speed camera was also able to recover the sound of Queen's "Under Pressure" being played through headphones connected to a computer, just by watching for tiny vibrations in the headphone cord. Music recognition app Shazam correctly identified the visual recording.

Davis says that although spying is the obvious application for visual microphones, he is more excited about using them as a new way of measuring the physical properties of objects remotely.

"We look at how light is reflected off an object, and that tells us the colour of that object," Davis explains. "Now we can see how the object responds to sound. It's a whole other dimension we could use. How something responds to sound indicates structural material properties that we're not used to looking at, and our hope is that the project this will find completely new applications."

Rolling shutter

A variation of Davis' technique can even recover audio information from video taken by normal digital cameras. By exploiting a recording artifact called rolling shutter, where different pixels in a frame of video are captured at slightly different times, Davis and colleagues were able to recover the tune of "Mary Had a Little Lamb" from a normal frame rate video of a bag of candy sitting near a speaker.

"What's amazing about that example is that we recovered frequencies more than six times faster than the frame rate of the camera," he says. "We were all surprised and thrilled by this."

Even though this technique doesn't recover intelligible speech, Davis says that other useful information can be encoded in the video.

"We show one example where we can identify the gender of speakers by recovering sound from a box of tissues," he says.

Davis is due to present the team's paper at SIGGRAPH in Vancouver next week.

If you would like to reuse any content from New Scientist, either in print or online, please contact the syndication department first for permission. New Scientist does not own rights to photos, but there are a variety of licensing options available for use of articles and graphics we own the copyright to.