Rebalancing the Security Equation

Joe Travaglini, director of product marketing at Sqrrl and Ely Kahn, vice president of business development at Sqrrl, are our guest bloggers. They explain Sqrrl’s integration with Hortonworks Data Platform (HDP).

There Is No Secure Perimeter

With the dawn of phenomena such as Cloud Computing and Bring Your Own Device (BYOD), it is no longer the case that there is a well-defined perimeter to secure and defend. Data is able to flow inside, outside, and across your network boundaries with limited interference from traditional controls. The “trusted zone” as we know it is a thing of the past.

Furthermore, Big Data is all about breaking down silos and gathering disparate data sources with various security and compliance requirements into a shared platform. While this enables building new types of applications and analytics, it also compounds the risks of data loss events, given the extra gravity these platforms command. In other words, Big Data amplifies the stakes of security.

How will you address this issue? It requires rethinking the approach. We need to embrace the chaos and change the security equation entirely. If we can’t adequately protect the data, why not let it protect itself?

A New Security Paradigm

Data-Centric Security describes the philosophy that all data has embedded within it information that specifies policy, access, and governance. A core principle of the Big Data movement brought a fundamental change to the flow in the data-application lifecycle (i.e., “move the application to the data”, instead of the other way around), and Data-Centric Security involves a similar inversion. Rather than building layer upon layer of rules and protections, and funneling everything through multiple checkpoints to enforce security procedures, Data-Centric Security yields a hardened ecosystem with self-contained policy and distributed enforcement.

Fine-grained, cell-level security enforcement – the independent access validation of every field of data stored in the system, individually

Data labeling capability – the ability to assign visibility labels to data that specify access policy, using a set of rules

Policy specification capability – the ability to grant individual or groups of users entitlements to view data that has a particular set of visibility labels

Encryption, at-rest and in-motion – ensuring that data is always protected cryptographically, whether resident on disk or traversing the network

Secure search – ensuring that data is easily retrievable, and that this convenience does not provide a source of data leakage

Auditing – recording every client operation taken against the system

Sqrrl Enterprise is a secure, scalable, and flexible NoSQL database that allows secure integration, exploration, and analyses of disparate datasets. It sits on each data node within the Hadoop cluster and can power secure, real-time analytics and visualizations on Hadoop. Figure 2 outlines how Sqrrl Enterprise integrates with HDP.

Figure 2. Sqrrl/Hortonworks Joint Reference Architecture

Data is first ingested from a variety of sources. Sqrrl Enterprise can support bulk uploads via its MapReduce-based bulk uploader or streaming uploads via its integration with Apache Flume. When data is ingested it is labeled at the “cell-level”. This means that each individual key/value pair (or field in a JSON document) is tagged with a unique security label that dictates who can access that individual piece of data. All data is also encrypted (both in motion and at rest).

Data is then indexed via Sqrrl’s secondary indexing techniques and stored in an enhanced version of Apache Accumulo within HDP (full integration with HDP Accumulo is expected in mid-2015). Sqrrl Enterprise provides users with a powerful query language (referred to as SqrrlQL) to explore the data. A unique feature that Sqrrl provides is that SqrrlQL is fully integrated with cell-level security concepts. This means that users can conduct SQL-like, full-text, or graph searches, and they will only see the pieces of data that they are authorized to see based on how the data is tagged and their authorizations.

Sqrrl Enterprise also provides integrations with other tools, such as Apache Spark, R, Apache Pig, and MapReduce to run predictive analytics, including machine learning, over data stored in the platform. Apache Hive integration is also expected in the future.

Apache Slider is an incubating Hadoop project that will enable YARN for long-running processes, such as Apache Accumulo. Since Sqrrl has a foundation of Accumulo, YARN support for Sqrrl will come online as Slider graduates to a top-level Apache project.

Integration with Other Hadoop Security Projects

There are also a variety of other Hadoop-related security projects that can complement the capabilities of Sqrrl Enterprise. A previous Hortonworks blog post identified a number of these projects, and below is a list highlighting how Sqrrl Enterprise interfaces with them.

Apache Ranger: This project is focused on coordinating security policies across the entire Hadoop stack, and can help ensure policies associated with Sqrrl Enterprise are aligned with the rest of the Hadoop stack.

Apache Knox: Knox provides authentication, authorization, audit, and SSO capabilities for the Hadoop stack. Sqrrl Enterprise currently has support for Kerberos, LDAP, Active Directory, SSO, and audit, and the goal is to integrate these capabilities with Knox.

Use Cases

Sqrrl and Hortonworks have partnered to bring powerful Big Data solutions to a variety of large corporations in industries such as telecommunications, healthcare, government, and finance. Below is a description of the joint Sqrrl/Hortonworks solution for a Fortune 100 company.

Problem: The Company faces an evolving threat landscape presenting advanced persistent threats (APTs), massive volumes of data, and new levels of attacker sophistication. To confront this these threats, the Company sought the capability to perform advanced security analytics on years’ worth of collected data including Internet, active directory, email, USB, and VPN logs. Its current SIEM tools could not scale cost effectively or efficiently to this amount of data. The Company also had security concerns about integrating various datasets in a single location.

Sqrrl and Hortonworks Solution: Sqrrl and Hortonworks collaborated to provide a distributed computing and storage platform for the Company’s Security Operation Center. Sqrrl and Hortonworks are cost effectively ingesting and storing massive amounts of disparate cyber data in a single secure “data lake”, which enables better data retention and deeper analysis and visibility into the data. Specifically, Sqrrl Enterprise powers an internal investigations application to query and summarize massive amounts of cyber data from across the organization. Sqrrl Enterprise supports interactive query speeds, keyword searches, streaming results, and encryption of all data at rest and in motion. Sqrrl Enterprise also integrates with HDP to support advanced analytics, such as machine learning. The joint solution relies on Sqrrl’s data-centric security capabilities to enable secure access to the integrated cyber datasets from users across the organization.

Getting Started

These is a quick and simple way to experience the power of Sqrrl Enterprise + HDP. Sqrrl has recently released its Test Drive VM that is fully integrated with and packaged with HDP 2.1, courtesy of the Hortonworks Sandbox. To request access to the VM, please sign up here:

Tags:

Your email address will not be published. Required fields are marked *

Comment

Name*

Email*

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.