The Apache Hadoop big-data platform is still adolescent, but Hadoop distributor Cloudera on Wednesday introduced a maturity milestone in the form of Cloudera Sentry, a new role-based security access control project that will enable companies to set rules for data access down to the level of servers, databases, tables, views and even portions of underlying files.

Hadoop already has provisions for perimeter security, with options including open-source Kerberos, Oozie and Knox for user authentication. But once users are in, what Hadoop has lacked has been a way to define which users have access to what. That has left security-conscious organizations such as banks, insurance companies, healthcare organizations and government agencies with two bad options: tightly restricting access to certain data sets to a select few users or entirely avoiding moving certain types of data onto Hadoop clusters.

With Sentry, Cloudera says it can support four common security requests. First, security administrators can use Sentry to set specific access control privileges for authenticated users. Second, it provides for fine-grained access to subsets of data within files based on defined roles. A fine-grained view might let users see certain columns related to customers while preventing access to their financial information.

Third, role-based rules can be established whereby a fraud-detection group might get access to financial records whereas a business analyst group would not have access to that information. Finally, Sentry also supports multi-tenant security administration, which enables customers of service providers to set their own security controls without having to go through a higher-level administrator.

"Sentry will enable our customers to store more sensitive data within Hadoop and open up access to information to more users knowing that they have control over more use cases and applications," said Justin Erickson, Cloudera's director of product management, in a phone interview with InformationWeek.

For now, Sentry works with Apache Hive, through HiveServer2, and Cloudera Impala, through a new Impala 1.1 release also announced Wednesday. Cloudera plans to go beyond Hive and Impala to extend security controls to other components of the Hadoop framework, according to Erickson. Hive and Impala were chosen as a starting point because they support SQL-style access to data, but directly by users and through business intelligence applications and ETL tools.

Hive is a well-established open-source query infrastructure that runs on top of Hadoop, but it's notoriously slow because it relies on MapReduce processing running behind the scenes. Impala is a Cloudera-developed, SQL-on-Hadoop component that supports direct querying of data in the Hadoop Distributed File System (HDFS) and HBase (NoSQL database) indexes. Cloudera says Impala querying is three to 30 times faster than Hive.

Cloudera has contributed Impala to the open-source community, but it's the only vendor likely to support it. For one thing, management and monitoring of Impala queries is something you do through Cloudera's subscription-based commercial management console. For another, all of Cloudera's rivals have introduced or are working on their own SQL-on-Hadoop tools. The list includes Hortonworks-supported Stinger, MapR-supported Drill, Pivotal's proprietary HAWQ engine and IBM-supported BigSQL.

Cloudera said Sentry, too, will be contributed to the open-source community and will be an Apache-licensed project. Cloudera isn't the only vendor working on Hadoop Security, but this is an area where a consistent approach across all vendors will be crucial to Hadoop's long-term success. Hortonworks, Cloudera's biggest rival, could not be reached in time for comment.

Technology such as the Sentry technology from Cloudera is crucial when it comes to data management and security, and I am actually kind of surprised that the Hadoop clusters havenG«÷t had these kinds of security access applications. Having the options to assign role-based security privileges to information will be beneficial to all Hadoop users and administrators and should contribute to their growing popularity.

After press time, Hortonworks offered the following statement about Sentry from Shaun Connolly, VP of Corporate Strategy at Hortonworks:

"The capabilities that Cloudera is targeting make sense and are valuable. Since Cloudera Sentry (previously called Cloudera Access Server) plugs into HiveServer2, including it into the Apache Hive project would make logical sense. With that said, byseparating this work from Apache Hive, Cloudera is introducing a new authorizationmodel for ClouderaG«™not 'for Hadoop.' Unfortunately Sentry's broader community benefit may be limited. Hortonworks engineers working within the Apache Hive community are open to working with Cloudera on integrating these capabilities directly into the Apache Hive project."

See my concluding comments above about a single approach to security being something that would be for the good of all Hadoop users and distributors.