Increasing risks of cybercrime and other malicious activity on the Internet is prompting enterprises to deploy more security controls and collect more data than ever before.

As a result, advances in big data analytics are now being applied to security monitoring for broader and more in-depth analysis to protect valuable company resources. Called big data security analytics, this technology — in part — leverages the scalability of big data and combines it with advanced analytics and security event and incident management systems (SIEM).

As we noted in the last article in this series, big data security analytics is appropriate for many, but not all, use cases. Consider the challenges of detecting and blocking advanced persistent threat techniques. Attackers who use these techniques may employ slow-paced, low-visibility attack patterns to avoid detection. Conventional logging and monitoring techniques can miss this kind of attack. Steps in the attack may occur on separate devices, over extended periods of time, and appear to be unrelated. Scanning logs and network flows for suspicious activity can sometimes miss key parts of an attacker’s kill chain, since they may not vary much from normal activity. One way to avoid missing data is to collect as much information as possible. This is the approach used in big data security analytics platforms.

As the name implies, this approach to security analytics draws on the tools and techniques designed for collecting, analyzing and managing large volumes of data generated at high velocity. These same techniques are used to drive products –ranging from movie recommendation systems for streaming video users, to analysis of vehicle performance characteristics to optimize the efficiency of transportation fleets. They are just as useful when applied to information security.

When evaluating big data security analytics platforms, be sure to consider five factors that are essential to realizing the full benefits of big data analytics:

Unified data management platform;

Support for multiple data types, including log, vulnerability and flow;

Scalable data ingestion;

Information security-specific analytics tools; and

Compliance reporting.

Together these features help to provide the breadth of functionality needed to collect large volumes of data at the speed at which they are generated, and to analyze the data fast enough to enable information security professionals to respond effectively to attacks.

Factor #1: Unified data management platform

A unified data management platform is the foundation of a big data security analytics system; the data management platform stores and queries enterprise data. This sounds like a well-known and solved problem, and which should not be a distinguishing characteristic, but it is. Working with large volumes of data typically requires distributed databases, as relational databases do not scale as cost-efficiently as distributed NoSQL databases — such as Cassandra and Accumulo. The scalability of NoSQL databases, meanwhile, comes with its own drawbacks. For example, it is difficult to implement distributed versions of some features of databases that we might take for granted, such as ACID transactions.

The data management platform underlying a big data security analytics product has to balance data management features with cost and scalability. The database should demonstrate an ability to write new data in real-time without blocking on writes. Similarly, queries should execute fast enough to support real-time analysis of incoming security data.

Another important aspect of a unified data management platform to consider is data integration.

Factor #2: Support for multiple data types

Big data is often described in terms of volume, velocity and variety. The variety of security event data presents a number of challenges to data integration.

Event data is collected at different levels of granularity. For example, network packets are low-level, fine-grained data, while log entries about a change to an administrator password on a server are rather coarse-grained. In spite of the obvious difference, they could be linked, however. Network packets could capture data about the attacker’s method for reaching a targeted server and — once gaining access to it — could change the administrator password.

The semantics of event data vary across data types. Network packet information helps analysts understand what data was transmitted between two endpoints, while the log of a vulnerability scan describes, to some degree, the state of a server or other device over an extended period of time. Big data security analytics platforms need sufficient information about the semantics of different data types to adequately integrate them.

Factor #3: Scalable data ingestion

Servers, endpoints, networks and other infrastructure components are constantly changing states. Many of these state changes log useful information that should be transmitted to a big data security analytics platform. Assuming the network has sufficient bandwidth, the biggest risk is that the data ingestion component of the security analytics platform cannot keep up with incoming data. If that were the case, data could be lost, undermining the purpose of deploying a big data security analytics platform.

Systems can accommodate scalable data ingestion by maintaining high write throughput of queuing data in a message queue. Some databases, meanwhile, are designed to support high-volume writes by using an append-only approach to writes. Data is appended to the end of a commit log instead of writing to an arbitrary block on the disk. This reduces the latency associated with random writes to magnetic disks. Alternatively, the data management system may maintain a queue that acts as a buffer to hold data while it is written to disk. If there is a spike in messages or a hardware failure that is slowing write operations, data can accumulate in the queue until the database can clear the backlog of writes.