Introducing File and Folder ACLs for Azure Data Lake Store

Overview

We’re excited today to announce the availability of File and Folder ACLs for the Azure Data Lake Store. Many of you have been eagerly awaiting this feature because it is critical in securing their big data.

When we launched the preview of Data Lake Store in October 2015, filesystem security was controlled by a single ACL at the root of store that applied to all files and folders underneath.

Starting today, ACLs can be set on any file or folder within the store, not just the root folder.

The Access Control Model used by Data Lake Store

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs. Before we dive deep into the details on the ACL model, here are key points to remember.

POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can read files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.

ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

Giving an HDInsight Cluster Access to Data Lake Store

ProTip: Leverage the power of Active Directory Security groups

Repeating manual steps is both irritating and prone to error. It’s easier if you use Active Directory security groups.

First give the needed permissions to the security group. Afterwards, adding new users is simple: just add them to the security group. This will dramatically simplify maintaining and securing your Data Lake.

The word documents need some corrections:
Doc: Understanding AC – on page 8 PowerShell scripts are mentioned, they can be found in the other doc, while the current doc is mentioned.
Doc: Add new User – the link to Github is broken: Download the “Add-AdlaJobUser.ps1” PowerShell script from our Github.

– We’ve fixed the Understanding AC doc to remove the reference to PowerShell. Instead the other docs linked in the blog post contain the PowerShell information.
– The link to the script is also now fixed
– The blog post now has two additional links to (1) a doc on adding users to ADL Store and (2) a doc on letting an HDInsight cluster use ADL Store

When creating my HDInsight hadoop cluster, I am finding that providing my HDInsight Azure AD service principal to a folder within my Azure Data Lake Store sub folder takes a long time since there are 2500 files (i.e. around 5 mins). Any guidance to make this faster or should I keeping my file count (i.e. json, csv) lower (but larger)? I intend on working in a scenario where I reach terabytes of data by large file size and large quantity of files. Appreciate any guidance.

I am using a service account (with appid and key) to upload data(flat file) to ADLS from powershell. And I am using some other account(XYZ) that has “Assigned Permissions” as Read ,write and execute for that same ADLS. But I am not able to access that file with the XYZ account. This XYZ is not the owner , but it has the above said permissions.
P.S : All these entities are in the same Tenant.