Big Data Ushers in Era of Security Intelligence

Image: mthierry/Flickr

Advanced cyber-security threats, whether they are criminals, hactivists or nation states, are breaching organizations at an alarming rate. Aided by time, persistence and smarts, they adeptly penetrate an organization and exfiltrate confidential data without alerting tradition security software tools.

How is this possible? They use spear phishing and social engineering to leapfrog hardened perimeters. The perpetrators also rely on custom, constantly-changing malware to avoid detection from traditional anti-malware solutions. Websites exist where hackers, for a fee, can uploadtheir custom malware to test it against dozens of commercial anti-malware solutions. Hackers can then have confidence that when they use their custom malware, it will avoid detection. Once in an organization, the hackers use tools such as keyloggers and password hash crackers to obtain legitimate, privileged credentials and move with impunity. The hackers typically infectdozens of machines with a variety of backdoors, so eradicating them is difficult. That’s the bad news.

Spotting the Criminal Before you throw your hands up and wave the white flag, there is an unavoidable weakness of advanced threats that you can exploit. The things these threats do on your network are abnormal and deviate from the baseline of what would be expected for an average user or IP address. If you can hone in on these deviations and outliers, then you can detect and defeat the threats.

To spot the outliers, first you need a way of aggregating the machine data or logs generated by your IT infrastructure, at both the network and endpoint. The key is that the machine data is not just from security sources like firewalls, anti-malware, IDS, but also from non-security sources like Windows event logs, DNS, web logs, and email logs. This data could be terabytes of data a day and fits the definition of “big data” – data of such high variety, velocity, and volume that it overwhelms traditional data stores.

Secondly, you need a way to do advanced correlations and statistical analysis on this massive amount of data in real time to connect the dots and expose the minute fingerprints of an advanced threat hiding in a sea of seemingly harmless event data.

What does an outlier representing an advanced threat look like in machine data? There is no magical, short list of the events that represent an advanced threat. Security practitioners need to “think like a criminal” and be creative to create real-time correlations to identify the behavior.

Here are two scenarios:

Scenario 1: An internal employee receives an external email from a sender domain that has rarely emailed the organization. This is followed by the employee visiting a website that is rarely visited by anyone in the organization. This in turn is followed by ararely seen service, DLL, or process starting up on the employee’s laptop. These three events in combination over a limited time frame indicate the employee may have received a spear phishing email, clicked on a link to a website with custom malware and the malware then installed on the employee’s endpoint.

Scenario 2: An internal employee or machine shows an amount of outbound DNS traffic or DNS requests that is standard deviations above the norm. While DNS is a ubiquitous protocol used to enable web browsing, this sort of abnormally high activity indicates that perhaps a hacker is trying to exfiltrate confidential data out of an organization via DNS. Or the employee’s machine is part of a botnet and malware on it is trying to connect back to a command and control server.

Setting the Trap In the past, traditional Security Information and Event Management (SIEM) products could not aggregate and correlate the massive amounts of data we are discussing. Their fixed schema, SQL databases and physical appliances severely limited their scale, what they could ingest and how fast they could search it.

In the last few years, new big data security platforms have emerged as a new weapon for forward-thinking organizations. These platforms have leveled the playing field and made it possible to detect advanced threats early. These systems are able to scale up to 100 terabytes or more per day and ingest all types of machine data, without a SQL datastore or fixed schema. These big data platforms also leverage distributed search for fast real-time searches and alerts, use statistics, math and baselining to spot anomalies and deviations, and scale horizontally by adding more indexers or nodes and installing on commodity hardware. Splunk and Hadoop are the two technologies leading the charge in this space.

Leading organizations adopting big data include CedarCrestone, which hosts ERP environments. CedarCrestone discovered that it was almost impossible to get its log data into a traditional SIEM and then parse and correlate that data. So it turned to big data so that it could easily ingest this machine data to monitor for both known and unknown threats, and also perform comprehensive securityinvestigations. The identification of unknown threats includes monitoring for ports and services that have changed or appeared to be unauthorized or misconfigured—these are the possible fingerprints of an advanced threat.

2013 will bring rapid uptake of big data platforms for security use cases. Not only will these big data platforms help spot advanced threats, but they will be used for forensics, incident investigations and fraud detection. With all the historical data indexed in these platforms, extending them to other complementary use cases is logical and easily done.

While there is no “easy button” or “silver bullet” for advanced threat detection, big data represents a compelling way to change the tide of cyberwarfare back in favor of the good guys.