How Machine Learning Will Help Attackers

Inside McAfee Labs' predictions (PDF) for 2017 is this: criminals will use machine learning to analyze massive quantities of stolen records to identify potential victims and build contextually detailed emails that very effectively target these individuals. In short, just as defenders use machine learning to detect attacks, attackers will use machine learning to automate attacks and evade detection.

SecurityWeek spoke to Intel Security's CTO, Steve Grobman, to learn more. He sees two separate areas in which adversaries will use machine learning (ML). The first "is to use ML techniques to develop strategies to disrupt ML defenses." The second is ML "as a tool to make their attacks more effective -- a good example that we're starting to see already is using ML for the automation of advanced spear phishing."

In the first approach, said Grobman, "Machine learning can be used to analyze defense methods and develop new evasion techniques." For example, "by poisoning the model -- introducing false data so that the good guys' ML defenses will start to classify things incorrectly."

Another good example, he added, "is a technique called 'raising the noise floor'." In this approach, an adversary will bombard an environment with information that is really false positives, but that look like things that would be detected by various ML detection models.

"If he starts to get a lot of false positives, the defender will need to recalibrate his model to make it less sensitive," Grobman explained. But the false positives fed into the defense can be crafted to be similar to a planned future attack. "Essentially the attacker causes the defender to recalibrate his model so that he doesn't pick up all these falses, and this opens the door to allow the attacker to sneak in," he said.

This type of ML versus ML is already used by Endgame in its red team versus blue team wargames environment. Red teams are attackers; blue teams are defenders. Endgame uses machine learning to simulate both, each learning from the other -- but Grobman sees this spilling out from simulations into real life red team attackers.

His second area of concern involves the use of machine learning to refine social engineering attacks; and the danger is that such automation will allow targeted spear phishing at scale -- bulk phishing campaigns with the success rate of targeted attacks.

"In the past," he explained, "you could either do an automated bulk phish with little personalization, or you could do highly targeted spear phish campaigns. In the latter, a human does the analysis from social media, news stories and so on in order to determine the social engineering content for the spear phishing. Machine learning can give you the effectiveness of spear phishing within bulk phishing campaigns; for example, by using ML to scan twitter feeds or other content associated with the user in order to craft a targeted message."

Just as Endgame provides a basic model for ML versus ML, so too there is a model for ML-based social engineering already in the public domain. At Black Hat USA 2016, John Seymour and Philip Tully presented a paper titled "Weaponizing data science for social engineering: Automated E2E spear phishing on Twitter" (PDF).

This paper describes and presents "SNAP_R, a recurrent neural network that learns to tweet phishing posts targeting specific users. The model is trained using spear phishing pentesting data, and in order to make a click-through more likely, it is dynamically seeded with topics extracted from timeline posts of both the target and the users they retweet or follow."

The scary part of SNAP_R is that tests prove it remarkably effective. In tests involving 90 users, the automated spear phishing framework had between a 30% and 60% success rate. Large scale manual spear phishing traditionally has a 45% success rate. Bulk phishing has just a 5% to 14%. But these are early days in the evolution of ML models for social engineering and we can expect rapid improvements over the next couple of years. Machine learning is likely to make targeted spear-phishing more accurate and available in bulk to the adversaries.

Grobman's concern is that criminals will always adopt whatever technologies improve their chance of success. The problem for business, he told SecurityWeek, is that everything is already available: machine learning algorithms and data science tutorials are in the public domain. And the public cloud offers low cost on-demand undetectable compute power to do the number crunching. Machine learning by adversaries is likely to become almost as prolific as machine learning by defenders.

Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.