hdinsight-dotnet-python-azure-storage-shared-access-signature

How to restrict access to Azure blob storage from HDInsight by using shared access signatures. This sample spans HDInsight and Azure Storage, and samples are provided for dotnet and python.

Create a Shared Access Signature

You can use either the SASExample solution (C#) or SASToken.py (Python) to retrieve a Shared Access Signature (SAS) for an existing Azure Blob Storage account.

Using SASExample (C#)

Open the project in Visual Studio. It's contained in the CSharp directory of this repository.

Right click on the project in Solution Explorer, then select properties.

In properties, select Settings.

In settings, populate the following entries:

StorageConnectionString: The connection string for the storage account that you want to create a stored policy and SAS for. The format should be DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey where myaccount is the name of your storage account and mykey is the key for the storage account.

ContainerName: The container in the storage account that you want to restrict access to.

SASPolicyName: The name to use for the stored policy that will be created.

FileToUpload: The path to a file that will be uploaded to the container. There's a sample.log file in the sampledata folder of this project that can be used.

Run the project. It will open a console window and display the SAS token created using the policy. This can be used to provide read and list access to the container. Save the token for later use.

Using SASToken.py (Python)

Note: This currently requires 0.32.0 of the Azure Storage SDK for Python.

Open the SASToken.py file (in the Python directory of this repository,) and change the following values:

policy_name: The name to use for the stored policy that will be created.

storage_account_name: The name of your storage account.

storage_account_key: The key for the storage account.

storage_container_name: The container in the storage account that you want to restrict access to.

example_file_path: The path to a file that will be uploaded to the container.

Run the script. It will display the SAS token created using the policy. This can be used to provide read and list access to the container. Save the token for later use.

Create an HDInsight cluster that uses the token

Open the HDInsightSAS.ps1 from the CreateCluster directory of this repository.

Replace the following values:

$clusterName - set this to the name you want to use for the new HDInsight cluster. It must be a unique name.

$osType - set this to 'Linux' or 'Windows' to set the OS of the HDInsight cluster.

$resourceGroupName - set this to the name of the resource group that will contain the cluster.

$location - set this to the name of the Azure region that the cluster will be created in.

$defaultStorageAccountName - set this to the name of a storage account. This is where the default, read/write access storage for the cluster will be created. This should be a different storage account than the one used for the SAS token.

$SASStorageAccountName - set this to the name of the storage account that you used when generating the SAS token.

$SASContainerName - set this to the name of the container that you used when generating the SAS token.

$SASToken - set this to the SAS token that you generated earlier

Save the file after you have made changes.

Open a PowerShell prompt and authenticate to your Azure subscription:

Add-AzureRmAccount

Run the script from the PowerShell Prompt.

.\HDinsightSAS.ps1

It will take around 15 minutes to complete the cluster creation process.

Update an existing Linux-based cluster to use the token

If you have an existing Linux-based HDInsightr cluster, you can update it to use the SAS secured storage.

Open the Ambari web UI for your cluster. The address for this page is https://YOURCLUSTERNAME.azurehdinsight.net. When prompted, authenticate to the cluster using the admin name (admin,) and password you used when creating the cluster.

2 From the left side of the Ambari web UI, select HDFS and then select the Configs tab in the middle of the page.

Select the Advanced tab, and then scroll until you find the Custom core-site section.

4 Expand the Custom core-site section, then scroll to the end and select the Add property... link. Use the following values for the Key and Value fields:

Key: fs.azure.sas.CONTAINERNAME.STORAGEACCOUNTNAME.blob.core.windows.net
Value: The SAS returned by the C# or Python application you ran previously
Replace CONTAINERNAME with the container name you used with the C# or SAS application. Replace STORAGEACCOUNTNAME with the storage account name you used.

Click the Add button to save this key and value, then click the Save button to save the configuration changes. When prompted, add a description of the change ("adding SAS storage access" for example,) and then click Save.

Click OK when the changes have been completed.

This saves the configuration changes, but you must restart several services before the change takes effect.

In the Ambari web UI, select HDFS from the list on the left, and then select Restart All from the Service Actions drop down list on the right. When prompted, select Turn on maintenance mode and then select __Conform Restart All".

Repeat this process for the MapReduce2 and YARN entries from the list on the left of the page.

Once these have restarted, select each one and disable maintenance mode from the Service Actions drop down.

Once connected to the cluster, use the following steps to verify that you can only read and list items on the SAS storage account:

From the prompt, type the following. Replace SASCONTAINER with the name of the container created for the SAS storage account. Replace SASACCOUNTNAME with the name of the storage account used for the SAS:

This will list the contents of the container, which should include the file that was uploaded when the container and SAS was created.

Use the following to verify that you can read the contents of the file. Replace the SASCONTAINER and SASACCOUNTNAME as in the previous step. Replace FILENAME with the name of the file displayed in the previous command: