MIT can now eavesdrop through soundproof glass by watching the vibrations of a bag of chips

Share This article

With a breakthrough that sounds more like the plot from the latest instalment of James bond than an academic research paper, engineers at MIT have managed to recover speech by analyzing the tiny vibrations of a potato chip bag from 15 feet away — with a video camera watching through soundproof glass. That is, if you’re having a private conversation in a room, but a spy can see a chip bag (or any other object) through the window, they could work out what you’re saying. There are some obvious security and forensics repercussions from this work, which is being presented at Siggraph 2014 next week, but other interesting uses will surely emerge (such as recovering audio from silent film, perhaps?)

Before we discuss how MIT recovers sound from a silent video feed, you should first watch the video below. The video does a good job of showing you how effective MIT’s passive recovery technique is, in a variety of different scenarios.

This technique, which MIT calls “the visual microphone,” works by analyzing how an object vibrates when it’s hit by sound waves. While it might not be entirely obvious, sound waves traveling through air are just regions of high and low pressure — and when these waves hit an object, the object is buffeted and vibrates in much the same way as your own eardrum.

The problem is, unless you’re six feet away from the speaker stack at a Pantera concert, these vibrations are really, really small. Earlier this morning I spent a good five minutes talking to an empty potato chip bag to see if I could spot the vibrations, but alas I could not. To get around this problem, MIT uses two tricks. First, it borrows a technique developed by another group at MIT that massively amplifies the tiniest of movements and variations in a video feed (this technique can monitor your pulse by watching for the tiniest variations in skin color caused by blood being pumped around your body). Second, the effectiveness of the visual microphone is significantly boosted by using a high-speed camera — basically, to see high-frequency vibrations, you ideally need a camera that captures at thousands of frames per second (if you want to reconstruct human speech at 300Hz, you preferably want to capture at 300 fps or higher).

The researchers found that, for normal-amplitude sounds (speech, music), the pressure waves caused objects to move/vibrate around one-tenth of a micrometer. This is about five-thousandths of a pixel in a close-up image, apparently. To spot the vibrations, the MIT team looks at the minute changes in a pixel’s color. For example, imagine a white chip packet in a blue-painted room. The edge of the chip packet, while it wouldn’t visibly move, would vary between shades of white and blue depending on the vibrations — and these are shades that can easily be detected by software. [DOI: 10.1145/2601097.2601119 - "The visual microphone: passive recovery of sound from video"]

A plane’s propeller, showing the effects of a rolling shutter [image credit]

While the visual microphone is most effective with a high-frame-rate camera, the MIT researchers also had some success recreating audio from ordinary (DSLR) video camera footage. Most consumer video cameras use what’s known as a rolling shutter, where each row of the imaging sensor is read sequentially, rather than all at once. This can create some interesting artifacts, especially in the case of fast-moving objects like cars or rotor blades. These artifacts can also be used to recover audio data, though as you can see in the video the quality is much lower. [Read: MIT perfects cheap, accurate through-wall movement and heartbeat detection with WiFi.]

The visual microphone has obvious applications in the realms of law enforcement, intelligence gathering, and forensics. While laser microphones (which detect vibrations on a pane of glass) are fairly old hat by this point, the visual microphone can be used after-the-fact on recorded footage. If the technique can be improved so that high-frame-rate cameras or rolling shutters aren’t required, then we might even be able to recover sound from silent films, such as those starring Charlie Chaplin.

Tagged In

Post a Comment

I kind of doubt they will ever be able to get usable sound from a silent movie. That would be one heck of an extrapolation to get usable audio from a 24Hz or even 15Hz restoration.

massau

you would be able to get a 12Hz sound out of a 24Hz movie if there are no scan lines. so you are right it would be easier to do an object detection and fill in the sampled sounds.

Dozerman

I’m about one more of these articles away from becoming a hermit.

BillBasham

My personal plans for hermitude included a lot of chip eating, so I’m foiled again…

Dozerman

You’re a punny guy

massau

you could still use SMS,text as your main communication.
but if you have nothing to hide why would you need privacy (except for the pirate police).

Dozerman

Everyone has something to hide. I’d also say it’s never been a better time to start pushing for overdoing privacy than now, too.

massau

things to hide everyone probably has had a minor infringement with the law (like taking a “free” beer when you where working at a bar). but because every one did means that you should only be convicted if yours is statistical significant more than the others.

Dozerman

One of the problems with your argument is that it assumes that the government will never be corrupted.

massau

yes you will get a 1984. but trying to stop them and than claim for privacy will not suddenly stop them they will just make it more classified. if you want to stay undetected than use encryption. use prepaid cards etc.
you make your own privacy if you post it on FB than you can consider it public.

bob lebart

only answer: soundproof chips

Dozerman

… or writing things down on a piece of paper and immediately burning it. :)

bob lebart

Ah, the analog solution!

cpy

Oh wow, i wonder what laser spy microphones have to say about this.

ShasLa40

It’s like the computer in eagle eye, it finds out what the guy is saying because his voice vibrates through his arms, the table and then the mug of coffee on the table making a ripple in the liquid.

Tom

That sprang to my mind as well! It is exactly like that, yeah. Creepy/awesome.

Ken B

Seeing how much information they can pick up using this technique, imagine how well an animal can hear through it’s whiskars.

alphaa10000

This laser-based data recovery technique has been common knowledge for more than a decade among people who do not usually talk about such matters. MIT deserves credit for “rediscovery”, certainly.

greybirdtoo

The technique described here doesn’t use lasers. It uses video, initially from a high speed video camera, and later using a DSLR camera exploiting the rolling shutter of CMOS sensors..

Tim_in_Indiana

And we thought it was farfetched when HAL 9000 was able to eavesdrop on the crew in 2001: A Space Odyssey by reading their lips!

Atlas Egageawrg

Looks like a professional video camera.. To shoot great videos, We get the video equipment and stuff from Atlas Television.

Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2015 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.