LAS VEGAS—Most of us are probably familiar with machine learning in the context of trippy images created by the Deep Dream neural network. The general concept is to "train" a computer on a task so that it can operate autonomously. Ideally, machine-learning systems can adapt to new situations that their human creators could not have foreseen. And when put toward data security, it could lead to totally new breakthroughs.

At this year's Black Hat conference here, Cylance Senior Researcher Brian Wallace and Data Scientist Xuan Zhao walked attendees through some simply applications that could take the grunt work out of cyber security and, perhaps, generate new discoveries. They also generated a Taylor Swift song, but more on that later.

This is ideal for the security industry, which has been seeking new ways to identify not just malicious software, but suspicious activity on computers and networks. The current approach still uses lists of known malicious applications and activities, but has been bolstered by cloud-based identification and heuristic analysis that speed up identification and spotting new threats. The bad guys are highly organized and iterative, pumping out slight variations on existing scams and malware at a breathtaking pace. Meanwhile, nation-states have developed and deployed malware specifically designed to thwart security protections in order to attack high-value targets.

Spotting the Patterns The first example from the presenters was clustering, where machine-learning algorithms organize similar data into groups or clusters. It's a simple visualization method that can help human users notice critical features that would have otherwise been lost in the noise of the data. At the presentation, the team showed how clustering could divide up data obtained through port scanning into useful groups. Some, for example, were grouped by operating system, while others by similar open ports. With this in hand, security professionals could draw inferences between those devices.

The presenter's second example touched on a critical security issue: ferreting out command and control servers used to control massive botnets composed of infected computers. The bad guys use these networks to launch DDOS attacks, send spam, and other nefarious operations.

The trouble with C&C servers is that they are very hard to pinpoint. "It can be difficult to distinguish them between legit applications," said Wallace. "They very commonly end up looking like normal websites with login interfaces, MYSQL servers, etc."

In order to detect the nasty servers, the team first had to train their machine-learning systems. They used a mix of dangerous and benign servers as their corpus. By their own admission, it was a noisy process, with the systems making numerous requests of the target servers. The good news was once taught, the machine-learning systems were very, very stealthy. And not just stealthy, but reliable enough that the team converted it into a handy Chrome extension that can accurately determine whether or not a site you navigate to is a botnet C&C server, but what kind of botnet.

Hidden in Plain Sight While most of the session focused on ways to streamline security analysis, the team ended with a unique way to use machine learning to obfuscate information.

"Outbound network firewalls inspect the data that is being sent out," said Wallace. "If they see anything they can't figure out or understand, it's gonna drop the packet."

To defeat the firewall, Zhao explained how to transform data into something unrecognizable with Markov chains, which look at sequences of data and try to predict what will follow any random piece of new data. Zhao said it was like being able to predict tomorrow's weather by looking at today's weather and comparing it to years of weather data.

In order to disguise information, the team ran a Markov chain algorithm on a free eBook from Project Gutenberg. Once trained, they then had the algorithm tackle a secret key used in encryption. The algorithm dutifully produced a blob of random text that looked and sort of read like English, but looked nothing like an encryption key.

Related

This text would pass easily by a firewall, and could be decoded by a recipient that trained a Markov chain algorithm on the same text. It's very similar to a book cypher, where a shared text is the key to decoding a message.

As a final demonstration of a the predictive ability of a Markov chain, the team fed in a corpus of Taylor Swift lyrics along with Midi data taken from Swift's songs. After tapping out a few commands, a mechanical voice filled the room with words and slight melodic modulation not entirely dissimilar to a Taylor Swift song.

Machine learning has been a major topic in technology as of late. At this year's Google I/O, the company traditionally associated with search and utility products like email, aimed to reposition itself as a machine-learning company. To that end, Google released more machine-learning tools to its developers. And it was clear from the session that the machine learning in security is only just now in its infancy. Whether or not we get any more generative Taylor Swift songs in the process is unknown, but we can be hopeful.

About the Author

Max Eddy is a Software Analyst, taking a critical eye to the Android OS and security services. He's also PCMag's foremost authority on weather stations and digital scrapbooking software. He spends much of his time polishing his tinfoil hat and plumbing the depths of the Dark Web.
Prior to PCMag, Max wrote for the International Digital Times, The... See Full Bio

Get Our Best Stories!

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.