How we fooled Google's AI into thinking a 3D-printed turtle was a gun: MIT bods talk to El Reg

And mistook a baseball for a Monday morning coffee

Video Students at MIT in the US claim they have developed an algorithm for creating 3D objects and pictures that trick image-recognition systems into severely misidentifying them. Think toy turtles labeled rifles, and baseballs as cups of coffee.

It’s well known that machine-learning software can be easily hoodwinked: Google's AI-in-the-cloud can be misled by noise; protestors and activists can wear scarves or glasses to fool people-recognition systems; intelligent antivirus can be outsmarted; and so on. It's a crucial topic of study because as surveillance equipment, and similar technology, relies more and more on neural networks to quickly identify things and people, there has to be less room for error.

Can you get from 'dog' to 'car' with one pixel? Japanese AI boffins can

You definitely do not want to walk into somewhere secure, like an airport terminal, and be wrongly fingered as a gunman by a security scanner or some other automated computer system that mistook your headphones for an assault rifle, for instance. You also do not want someone walking into an airport terminal with an assault rifled with extra colorful markings on it that fools surveillance gear into thinking it is a harmless cupcake.

The problem is that although neural networks can be taught to be experts at identifying images, having to spoon-feed them millions of examples during training means they don’t generalize particularly well. They tend to be really good at identifying whatever you've shown them previously, and fail at anything in between.

Switch a few pixels here or there, or add a little noise to what is actually an image of, say, a gray tabby cat, and Google's Tensorflow-powered open-sourceInception model will think it’s a bowl of guacamole. This is not a hypothetical example: it's something the MIT students, working together as an independent team dubbed LabSix, claim they have achieved.

Below is the altered cat photo – modified by the team – that was passed to Inception to identify, and the resulting label was: guacamole.

Wrong! Tweaked cat photo is confused with guac (Image credit: LabSix)

The fudged inputs used to scam machine-learning systems are known as adversarial examples, which are described in depth by OpenAI here. The algorithms used to generate these special images sound scary but fall down a lot in the real world because external factors, from noise to lighting, can severely affect the examples themselves. In other words, the changes in an image needed to fool an AI system are very precise and any deviations, such as holding a picture or object at an angle to the neural network's camera, can ruin the effect: the software will be able to recognize the subject correctly.

Rotate the above adversarial example of the cat, and Google's model correctly realizes it’s actually a moggie. The attack photos are fragile, essentially. AI models can be fooled, but today's adversarial algorithms aren't particularly robust either. Both sides of the technology can improve over time – the models and the attacking algorithms – making it somewhat of an arms race.

Right! The cat is finally correctly identified once it has been rotated slightly

LabSix have taken generating adversarial examples one step further, or rather, one dimension further by 3D printing real objects that baffle AI. The crew – Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok – wanted to create objects that fool image-recognition systems regardless of the angle you hold the things, thus sidestepping the aforementioned limitations. And so far, they've succeeded, judging from their research.

Their general-purpose algorithm, known as Expectation of Transformation (EOT), can take a 2D photo as input and create from it examples that are adversarial over a range of angles of rotation. For 3D, it can slightly warp an object's texture over its shape so that Google's model is fooled into thinking it's a completely different thing, at any angle of view, whereas humans will still be able to recognize it.

Below is a video of a 3D-printed turtle that is correctly identified by Inception, and then fools the code with a texture slightly modified by EOT. The objects are shown to the classifier at various angles.

So far, the team have only demonstrated this using two 3D objects: a turtle model, and a grubby white baseball that the system thought was an espresso. The turtle was adversarial 82 per cent of the time, and the baseball was 59 per cent of the time.

EOT was also tested on 1,000 2D images, where each image belongs to a different image type, and the mean adversarial score was 96.4 per cent. In 3D simulations with a 100 different objects split into 20 categories, it was adversarial about 84 per cent of the time. The team's results have been written up in a paper under review for the International Conference on Learning Representations next year.

The difference in scores shows that it’s not easy to map simulations onto the real world. In fact, Engstrom, one of the paper's coauthors, admitted they had to print “four to five iterations of the turtle before we could get it to work well.”

It is difficult, but not impossible, to compensate for the environment around an object while it is being studied by the neural network, Ilyas, another coauthor, told us: in the turtle example, the lighting on the toy would interfere with the adversarial texture and make fooling the AI tricky. To overcome these lighting effects, the team printed out different turtles until they got it just right.

Simulating reality

In the real world, an algorithm like EOT only really works if you know exactly how the image-recognition system under attack learns to classify images – and these details are only really public in open-source models, unless you're good at reverse-engineering code. Google's Inception is well documented and the code is available, making attacking it entirely possible.

However, the team are now trying to figure out how to tweak their techniques so that they will work in more “black box” conditions, where little information about the system is needed.

“Our work gives an algorithm for reliably constructing targeted 3D physical-world adversarial examples, and our evaluation shows that these 3D adversarial examples work. [It] shows that adversarial examples are a real concern in practical systems,” the team said.

“A fairly direct application of 3D adversarial objects could be designing a T-shirt which lets people rob a store without raising any alarms because they’re classified as a car by the security camera,” they added.

Or in the case of security scanners, potentially dangerous items could be waved through if they are mistaken for something safer. “The TSA uses X-ray, not just standard RGB images, so breaking that will take some creativity on top of applying EOT, perhaps something like selectively applying a thick lead paint to certain parts of an object to change the way it is perceived by an x-ray scanner,” the LabSixers mused. ®