Cyber-security and cyber-criminals are engaged in a constant arms race. The minute a software vulnerability gets patched or a security tool is able to block a class of attacks, malware writers shift gears and look for something new to exploit.

For years, security professionals have strived to get ahead of attackers. But the status quo, unfortunately, is that security is reactive, and it's hard to imagine how it could be otherwise. How can you block cyber-crooks, after all, until you know what they're up to? We don't live in a Minority Report type of world, after all, where psychics can help us sniff out crimes before they happen.

That doesn't mean we don't try. In the physical world, the desire to get ahead of crime has led to all sorts of dubious practices, everything from stop-and-frisk to NSA snooping. In cyber-space, security professionals are now turning to big data to try to discover patterns that may indicate a crime is coming, even if it has not yet occurred.

Less invasive is the new Pleiades tool developed by researchers at Georgia Tech, the University of Georgia and security startup Damballa. Pleiades doesn't intuit coming crimes, but it can identify zero-day attacks before security researchers even know what exactly the malware is.

Pleiades monitors network traffic for specific patterns of behavior common to malware. Recently, it identified several attacks based on command-and-control (C&C) calls routed to Non-Existent Domains (NXDomains). NXDomains are important because infected devices communicate periodically with C&C servers in order to get instructions that tell it to, for instance, launch a DDoS attack or send out spam.

Legacy security tools block these types of botnets through blacklists of known C&C domains. Of course, once attackers learned this, they shifted tactics and started to rely on Domain Generation Algorithms (DGAs), which dynamically produce a large number of random domain names. The latest version of Conficker, for instance, generates as many as 50,000 NXDomains per day.

A small subset of those domains are used for C&C, each of which is used for a very short period of time, making the C&C domain associated to the malware a hard-to-find moving target.

What the researchers did was use machine learning to help the Pleiades separate benign traffic anomalies (i.e., a bunch of people typing in Facbook.com, a common typo, not a malware signature) from suspicious ones, such as a large number of end devices connecting to a large number of NXDomains.

"We are able to correlate network patterns with other information we have, so we don't have to reverse engineer the malware or even know what it is," said Marshall Bockrath-Vandegrift, engineering lead on Damballa's R&D team.

Does big data tip the scales in favor of security vendors?

It would seem to. While it's easy to imagine malware writers trying to mine big data insights from infected devices, it would be harder for them to see the forest for the trees. "To conduct this type of analysis, you have to have access to the network data," Bockrath-Vandegrift said. ISPs and major enterprises aren't going to hand over their data to cyber-crooks.

However, could criminals shift their sites, targeting, say, Hadoop databases?

That doesn't mean cyber-crooks will sit on their hands and accept this new reality. In fact, the Gameover variant of the ZeuS banking trojan didn't rely on standard C&C communications, instead taking advantage of peer-to-peer communications, which is much harder to detect.

"What no one realized, though, and what we were able to detect, is that Gameover has a backup command-and-control that uses DGAs," Bockrath-Vandegrift said. "We were able to analyze network data to identify pockets of behavior that we could correlate with known ZeuS command-and-control points, and we were able to learn something about this Trojan that other security experts missed when they reverse engineered it."

Pleiades has been flexing its muscles lately. Researchers at Georgia Tech, Damballa and Secureworks recently used it to discover a new form of the trojan Pushdo almost three months before the actual malware was discovered and publicized by major antivirus vendors.

Pleiades was also used to uncover the Flashback malware, which ultimately infected more than 600,000 Macintosh devices. Pleiades discovered this zero-day attack weeks before the malware was first discovered and announced by the security community.

Fending off bad guys or spying on everyone?

Of course, for all of the good big data could do, there is also a dark side. Governments, ISPs and tech companies can all exploit big data to invade our privacy. The NSA surveillance scandal is only the latest example of this fact.