Antivirus makers want you to believe they are adding artificial intelligence to their products: software that has learned how to catch malware on a device. There are two potential problems with that. Either it’s marketing hype and not really AI – or it’s true, in which case don’t forget that such systems can still be hoodwinked.

It’s relatively easy to trick machine-learning models – especially in image recognition. Change a few pixels here and there, and an image of a bus can be warped so that the machine thinks it’s an ostrich. Now take that thought and extend it to so-called next-gen antivirus.
…

The researchers from Endgame and the University of Virginia are hoping that by integrating the malware-generating system into OpenAI’s Gym platform, more developers will help sniff out more adversarial examples to improve machine-learning virus classifiers.

Although Evans believes that Endgame’s research is important, using such a method to beef up security “reflects the immaturity” of AI and infosec. “It’s mostly experimental and the effectiveness of defenses is mostly judged against particular known attacks, but doesn’t say much about whether it can work against newly discovered attacks,” he said.

“Moving forward, we need more work on testing machine learning systems, reasoning about their robustness, and developing general methods for hardening classifiers that are not limited to defending against particular attacks. More broadly, we need ways to measure and build trustworthiness in AI systems.”

The research has been summarized as a paper, here if you want to check it out in more detail, or see the upstart’s code on Github.

Machine learning classifiers are increasingly popular for security applications, and often achieve outstanding performance in testing. When deployed, however, classifiers can be thwarted by motivated adversaries who adaptively construct adversarial examples that exploit flaws in the classifier’s model. Much work on adversarial examples, including Carlini and Wagner’s attacks which are the best results to date, has focused on finding small distortions to inputs that fool a classifier. Previous defenses have been both ineffective and very expensive in practice. In this talk, I’ll describe a new very simple strategy, feature squeezing, that can be used to harden classifiers by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different inputs in the original space into a single sample. Adversarial examples can be detected by comparing the model’s predictions on the original and squeezed sample. In practice, of course, adversaries are not limited to small distortions in a particular metric space. Indeed, it may be possible to make large changes to an input without losing its intended malicious behavior. We have developed an evolutionary framework to search for such adversarial examples, and demonstrated that it can automatically find evasive variants against state-of-the-art classifiers. This suggests that work on adversarial machine learning needs a better definition of adversarial examples, and to make progress towards understanding how classifiers and oracles perceive samples differently.

Although deep neural networks (DNNs) have achieved great success in many computer vision tasks, recent studies have shown they are vulnerable to adversarial examples. Such examples, typically generated by adding small but purposeful distortions, can frequently fool DNN models. Previous studies to defend against adversarial examples mostly focused on refining the DNN models. They have either shown limited success or suffer from expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample.

By comparing a DNN model’s prediction on the original input with that on the squeezed input, feature squeezing detects adversarial examples with high accuracy and few false positives. If the original and squeezed examples produce substantially different outputs from the model, the input is likely to be adversarial. By measuring the disagreement among predictions and selecting a threshold value, our system outputs the correct prediction for legitimate examples and rejects adversarial inputs.

So far, we have explored two instances of feature squeezing: reducing the color bit depth of each pixel and smoothing using a spatial filter. These strategies are straightforward, inexpensive, and complementary to defensive methods that operate on the underlying model, such as adversarial training.

The figure shows the histogram of the L1 scores on the MNIST dataset between the original and squeezed sample, for 1000 non-adversarial examples as well as 1000 adversarial examples generated using both the Fast Gradient Sign Method and the Jacobian-based Saliency Map Approach. Over the full MNIST testing set, the detection accuracy is 99.74% (only 22 out of 5000 fast positives).

The video for my Enigma 2017 talk, “Classifiers under Attack” is now posted:

The talk focuses on work with Weilin Xu and Yanjun Qi on automatically evading malware classifiers using techniques from genetic programming. (See EvadeML.org for more details and links to code and papers, although some of the work I talked about at Enigma has not yet been published.)

Enigma was an amazing conference – one of the most worthwhile, and definitely the most diverse security/privacy conference I’ve been to in my career, both in terms of where people were coming from (nearly exactly 50% from industry and 50% from academic/government/non-profits), intellectual variety (range of talks from systems and crypto to neuroscience, law, and journalism), and the demographics of the attendees and speakers (not to mention a way-cool stage setup).

The model of having speakers do on-line practice talks with their session was also very valuable (Enigma requires speakers to agree to do three on-line practice talks sessions before the conference, and from what I hear most speakers and sessions did cooperate with this, and it showed in the quality of the sessions) and something I hope other conference will be able to adopt. You actually end up with talks that fit with each other, build of things others present, and avoid unnecessary duplication, as well as, improving all the talks by themselves.

I gave a talk on Weilin Xu’s work (in collaboration with Yanjun Qi) on evading machine learning classifiers at the O’Reilly Security Conference in New York: Classifiers Under Attack, 2 November 2016.

Machine-learning models are popular in security tasks such as malware detection, network intrusion detection, and spam detection. These models can achieve extremely high accuracy on test datasets and are widely used in practice.

However, these results are for particular test datasets. Unlike other fields, security tasks involve adversaries responding to the classifier. For example, attackers may try to generate new malware deliberately designed to evade existing classifiers. This breaks the assumption of machine-learning models that the training data and the operational data share the same data distribution. As a result, it is important to consider attackers’ efforts to disrupt or evade the generated models.

David Evans provides an introduction to the techniques adversaries use to circumvent machine-learning classifiers and presents case studies of machine classifiers under attack. David then outlines methods for automatically predicting the robustness of a classifier when used in an adversarial context and techniques that may be used to harden a classifier to decrease its vulnerability to attackers.

Weilin Xu presented his work on Automatically Evading Classifiers today at the Network and Distributed Systems Security Symposium in San Diego, CA (co-advised by Yanjun Qi and myself). The work demonstrates an automated approach for finding evasive variants of malicious PDF files using genetic programming techniques. Starting with a malicious seed file (that is, a PDF file with the intended malicious behavior, but that is correctly classified as malicious by the target classifier), it heuristically searches for an evasive variant that preserves the malicious behavior of the seed sample but is now classified as benign. The method automatically found an evasive variant for every seed in our test set of 500 malicious PDFs for both of the target classifiers used in the experiment (PDFrate and Hidost).

In addition to the results in the paper, Weilin found some new results examining gmail’s PDF malware classifier. We had hoped the classifier used by gmail would be substantially better than what we found in the research prototype classifiers used in the original experiments, and the initial cross-evasion experiments supported this. Of the 500 evasive variants found for Hidost in the original experiment, 387 were also evasive variants against PDFrate, but only 3 of them were evasive variants against Gmail’s classifier.

From those 3, and some other manual tests, however, Weilin was able to find two very simple transformations (any change to JavaScript such as adding a variable declaration, and adding padding to the file) that are effective at finding evasive variants for 47% of the seeds.

The response we got from Google about this was somewhat disappointing (and very inconsistent with my all previous experiences raising security issues to Google):

Its true, of course, that any kind of static program analysis is theoretically impossible to do perfectly. But, that doesn’t mean the dominant email provider shouldn’t be trying to do better to detect one of the main vectors for malware distribution today (and there are, we believe, many fairly straightforward and inexpensive things that could be done to do dramatically better than what Gmail is doing today).

The other new result in the talk that isn’t in the paper is the impact of adjusting the target classifier threshold. The search for evasive variants can succeed even at lower thresholds for defining maliciousness (as shown in the slide below, finding evasive variants against PDFrate at the 0.25 maliciousness threshold).

The main idea behind the paper is to explore how an adaptive adversary can evade a machine learning-based malware classifier by using techniques from genetic programming to automatically explore the space of potential evasive variants.

In a case study using two PDF malware classifiers as targets, we find that it is possible to automatically find evasive variants (that is, variants that preserve the desired malicious behavior while being (mis)classified as benign) for all 500 seeds in our test dataset.

Weilin Xu will present the work at the Network and Distributed Systems Security Symposium in San Diego in February.