Allow connections to the inbound ports for the data lake instance from this address range. Must be a valid CIDR IP. For example:

10.0.0.0/24 will allow access from 10.0.0.0 through 10.0.0.255.

0.0.0.0/0 will allow access from all.

Protected Gateway Access

This option is checked by default. This option provides password-protected access to the Ambari and Ranger web UIs. If you uncheck it, you will not be able to log in to these UIs. See Protected Gateway for more information.

You can expand SHOW ADVANCED OPTIONS to view additional options:

Parameter

Description

Use existing VPC and subnet

Specify whether to use an existing VPC and subnet to deploy the data lake inside it. See Existing VPC for more information.

Enter the credentials that you would like to use for administering the data lake. This provides a default administrator credentials for the Ambari and Ranger.

Provide the following CLOUD STORAGE parameters:

Parameter

Description

Amazon S3 Path

Enter the name of an existing S3 bucket (for example, my bucket) or a path to a specific folder in this bucket (for example, my bucket/data/data1). The bucket must exist prior to being registered with the data lake.

This information will be used to create two databases on the RDS instance: one for Hive and one for Ranger.

Alternatively, you can create these two databases by yourself and specify the connection information and credentials for each database. To access this option, expand SHOW ADVANCED OPTIONS and provide the following information for your Hive Metastore and Ranger Database.
Refer to Hive Metastore and Ranger Database for more information.

Click CREATE DATA LAKE.

Once the data lake is created, you can find its corresponding entry on the DATA LAKE SERVICES page, which is available from the navigation menu.

Once you’ve created a data lake, you can associate it with one or more ephemeral clusters. This option called DATA LAKE SERVICES is available when you create a cluster.

Existing VPC

You can optionally choose to install into a different VPC (and subnet) than the VPC in which the cloud controller instance is running. Default is to install the data lake instances into the same VPC as the cloud controller instance, but in a new subnet.

Hive Metastore

You have an option to either use a previously registered external Hive metastore or to have an external Hive metastore database created. In both cases, your external Hive metastore will be running on an Amazon RDS instance.

Parameter

Description

Register new Hive metastore...

Enter connection information for an existing database on an existing Amazon RDS instance and this Hive metastore will be automatically registered and used with the data lake. See Managing Shared Metastores for more information.

List of registered Hive Metastores

If you have previously registered a Hive metastore for HDP 2.6, you can select it from the list. This option is only available if you have previously registered at least one Hive metastore for HDP 2.6.

Ranger Database

Enter the following information and a Ranger database will be created on an existing Amazon RDS instance: