Validate that Cloudbreak Deployer Has Started

Cloudbreak Application logs:
cbd logs cloudbreak
You should see a message like this in the log: Started CloudbreakApplication in 36.823 seconds. Cloudbreak normally takes less than a minute to start.

Configure Role-based Credentials

There are two ways to create AWS credentials in Cloudbreak:

Key-based: This requires your AWS access key and secret key pair. Cloudbreak will use these keys to launch the resources. For starters, this is a simpler option that does not require additional configuration. You will provide the keys later when you provision an HDP cluster.

Role-based: This requires a valid IAM role with "AssumeRole" policy. Cloudbreak will assume this role to get temporary access and the access/secret key pair.

If you want to use instance profile, do not set these variables. If you want to use Cloudbreak with Role ARNs instead of keys, make sure that the instance profile role can assume roles on AWS.

Optional Configurations

You can perform the following optional configurations:

Set Custom Tags

In order to differentiate launched instances, we give you the option to use custom tags on your AWS resources deployed by Cloudbreak. You can use the tagging mechanism with the following variables.

If you want just one custom tag on your Cloudformation resources, set this variable :

export CB_AWS_DEFAULT_CF_TAG=whatever

Then the name of the tag will be CloudbreakId and the value will be whatever.

If you need more specific tagging, set this variable:

export CB_AWS_CUSTOM_CF_TAGS=myveryspecifictag:veryspecific

Then the name of the tag will be myveryspecifictag and the value will be veryspecific. You can specify a list of tags here with a comma separated list; for example: tag1:value1,tag2:value2,tag3:value3.

Cluster Provisioning Prerequisites

IAM Role Setup

If you want to use your Aws Access Key and your Secret Access Key to authenticate to Amazon then please use the Key based authentication and you do not need to setup an IAM Role.

Cloudbreak works by connecting your AWS account through so called Credentials, and then uses these credentials to
create resources on your behalf.

IMPORTANT Cloudbreak deployment uses two different AWS accounts for two different purposes:

The account belonging to the Cloudbreak webapp itself, acts as a third party, that creates resources on the
account of the end user. This account is configured at server-deployment time.

The account belonging to the end user who uses the UI or the Shell to create clusters. This account is configured
when setting up credentials.

These accounts are usually the same when the end user is the same who deployed the Cloudbreak server, but it allows
Cloudbreak to act as a SaaS project as well if needed.

Credentials use IAM Roles to give access to the
third party to act on behalf of the end user without giving full access to your resources.
This IAM Role will be assumed later by an IAM user.

AWS IAM Policy that grants permission to assume a role

You cannot assume a role with root account, so you need to create an IAM user with an attached Inline
policy and then set the
Access key and Secret Access key in the
Profile file (check this description out).

The sts-assume-role IAM user policy must be configured to have
permission to assume roles on all resources. Here it is the policy to configure the sts:AssumeRole for all
Resource:

To connect your (end user) AWS account with a credential in Cloudbreak you'll have to create an IAM role on your
AWS account that is configured to allow the third-party account to access and create resources on your behalf.
The easiest way to do this is with cbd commands (but it can also be done manually from the AWS Console):

The generate-role command creates a role that is assumable by the Cloudbreak Deployer AWS account and has a broad policy setup.
This command creates a role with the name cbreak-deployer by default. If you'd like to create role with a different
name or multiple roles, you need to add this line to your Profile:

export AWS_ROLE_NAME=my-cloudbreak-role

You can check the generated role on your AWS console, under IAM roles:
Full size here.

Generate a New SSH Key

All the instances created by Cloudbreak are configured to allow key-based SSH,
so you'll need to provide an SSH public key that can be used later to SSH onto the instances in the clusters you'll create with Cloudbreak.
You can use one of your existing keys or you can generate a new one.

# Enter file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
You'll be asked to enter a passphrase, but you can leave it empty.
# Enter passphrase (empty for no passphrase): [Type a passphrase]
# Enter same passphrase again: [Type passphrase again]

After you enter a passphrase the keypair is generated. The output should look something like below.

# Your identification has been saved in /Users/you/.ssh/id_rsa.
# Your public key has been saved in /Users/you/.ssh/id_rsa.pub.
# The key fingerprint is:
# 01:0f:f4:3b:ca:85:sd:17:sd:7d:sd:68:9d:sd:a2:sd your_email@example.com

Later you'll need to pass the .pub file's contents to Cloudbreak and use the private part to SSH to the instances

Cluster Provisioning via Browser

You can log into the Cloudbreak application at https://<Public_IP>/.

The main goal of the Cloudbreak UI is to easily create clusters on your own cloud provider account.
This description details the AWS setup - if you'd like to use a different cloud provider check out its manual.

This document explains the four steps that need to be followed to create Cloudbreak clusters from the UI:

connect your AWS account with Cloudbreak

create some template resources on the UI that describe the infrastructure of your clusters

create a blueprint that describes the HDP services in your clusters

launch the cluster itself based on these resources

IMPORTANT Make sure that you have sufficient qouta (CPU, network, etc) for the requested cluster size

Setting up AWS Credentials

Cloudbreak works by connecting your AWS account through so called Credentials, and then uses these credentials to
create resources on your behalf. The credentials can be configured on the manage credentials panel on the
Cloudbreak Dashboard.

To create a new AWS credential follow these steps:

Select the credential type. For instance, select the Role Based

Fill out the new credential Name

Only alphanumeric and lowercase characters (min 5, max 100 characters) can be applied

Copy your AWS IAM role's Amazon Resource Name (ARN) to the IAM Role ARN field

Copy your SSH public key to the SSH public key field

The SSH public key must be in OpenSSH format and it's private keypair can be used later to SSH onto every
instance of every cluster you'll create with this credential.

The SSH username for the EC2 instances is cloudbreak.

Any other parameter is optional here.

Public in account means that all the users belonging to your account will be able to use this credential to create
clusters, but cannot delete it.

Infrastructure Templates

After your AWS account is linked to Cloudbreak you can start creating resource templates that describe your clusters' infrastructure:

templates

networks

security groups

When you create one of the above resources, Cloudbreak does not make any requests to AWS. Resources are only created
on AWS after the create cluster button has pushed. These templates are saved to Cloudbreak's database and can be
reused with multiple clusters to describe the infrastructure.

Templates

Templates describe the instances of your cluster - the instance type and the attached volumes. A typical setup is
to combine multiple templates in a cluster for the different types of nodes. For example you may want to attach multiple
large disks to the datanodes or have memory optimized instances for Spark nodes.

The instance templates can be configured on the manage templates panel on the Cloudbreak Dashboard.

There are some optional configurations here as well:

Spot price (USD) If specified Cloudbreak will request spot price instances (which might take a while or never be
fulfilled by Amazon). This option is not supported by the default RedHat images.

EBS encryption is supported for all volume types. If this option is checked then all the attached disks will be encrypted by Amazon using the AWS KMS master keys.

If Public in account is checked all the users belonging to your account will be able to use this resource to create clusters, but cannot delete it.

Networks

Your clusters can be created in their own Virtual Private Cloud (VPC) or in one of your already existing VPCs.
If you choose an existing VPC it is possible to create a new subnet within the VPC or use an already existing one.
The subnet's IP range must be defined in the Subnet (CIDR) field using the general CIDR notation.

Default AWS Network

If you don't want to create or use your custom VPC, you can use the default-aws-network for all your
Cloudbreak clusters. It will create a new VPC with a 10.0.0.0/16 subnet every time a cluster is created.

Custom AWS Network

If you'd like to deploy a cluster to a custom VPC you'll have to create a new network template on the manage
networks panel.

You have the following options:

Create a new VPC and a new subnet: Every time a cluster is created with this kind of network setup a new VPC and a new subnet with the specified IP range will be created for the instances on AWS.

Create a new subnet in an existing VPC: Use this kind of network setup if you already have a VPC on AWS where you'd like to put the Cloudbreak created cluster but you'd like to have a separate subnet for it. This setup is only supported for basic VPCs, where an Internet Gateway is configured and instances can have public IP addresses to access the Internet. If you have a specific VPC setup (VGW, NAT, private subnets, etc..) then only the third option can be used.

Use an existing subnet in an existing VPC: Use this kind of network setup if you have an existing VPC with one or more subnets on AWS and you'd like to start the instances of a cluster in one - or more - of those subnets. Use this setup if you have a specific VPC setup: you should first create the subnet(s) directly through AWS and provide their IDs here. The subnets could be even in different availability zones and you can set a single or a comma separated list of subnets in the 'Subnet Identifier' field. There are only two requirements for the subnets:

instances in the subnet should be able to reach the Internet to download yum packages (it can be done through a Virtual Gateway, a NAT instance, an Internet Gateway or any other setup)

the VM where Cloudbreak is deployed must be able to reach the instances in the cluster on port 443. (It’s in the same subnet, or through a router from another subnet)

NOTE: instances in the subnet doesn't need to have public IP addresses in this case

You can configure the Subnet Identifier and the Internet Gateway Identifier (IGW) of your VPC.

IMPORTANT: The subnet CIDR cannot overlap each other in a VPC. So you have to create different network
templates for every each clusters.

To create a new subnet within the VPC, provide the ID of the subnet which is in the existing VPC and your cluster
will be launched into that subnet. For example you can create 3 different clusters with 3 different network
templates for multiple subnets 10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24 with the same VPC and IGW identifiers.

IMPORTANT: Make sure the define subnet here doesn't overlap with any of your already deployed subnet in
the VPC, because the validation only happens after the cluster creation starts.

In case of existing subnet make sure you have enough room within your network space for the new instances.

If Public in account is checked all the users belonging to your account will be able to use this network template
to create clusters, but cannot delete it.

NOTE: The VPCs, IGWs and subnet are created on AWS only after the the cluster provisioning starts with the selected
network template.

Security groups

Security group templates are very similar to the security groups on the AWS Console.
They describe the allowed inbound traffic to the instances in the cluster.
Currently only one security group template can be selected for a Cloudbreak cluster and all the instances have a
public IP address so all the instances in the cluster will belong to the same security group.
This may change in a later release.

Default Security Group

You can also use the two pre-defined security groups in Cloudbreak.

only-ssh-and-ssl: all ports are locked down except for SSH and the selected Ambari Server HTTPS (you can't access Hadoop services outside of the VPC):

SSH (22)

HTTPS (443)

Custom Security Group

You can define your own security group by adding all the ports, protocols and CIDR range you'd like to use. The rules
defined here doesn't need to contain the internal rules, those are automatically added by Cloudbreak to the security group on AWS.

Defining Cluster Services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones.
Blueprints can be added from file, URL (an example blueprint) or the
whole JSON can be written in the JSON text box.

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and
components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE: It is not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will
fill that with a default value.

If Public in account is checked all the users belonging to your account will be able to use this blueprint to
create clusters, but cannot delete or modify it.

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight
modifications.
There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the
modifications have to be done manually.
When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations...etc. that won't be applicable to the Cloudbreak cluster

Cluster Deployment

After all the cluster resources are configured you can deploy a new HDP cluster.

Here is a basic flow for cluster creation on Cloudbreak Web UI:

Start by selecting a previously created AWS credential in the header.

Open create cluster

Configure Cluster tab

Fill out the new cluster name

Cluster name must start with a lowercase alphabetic character then you can apply lowercase alphanumeric and
hyphens only (min 5, max 40 characters)

Select a Region where you like your cluster be provisioned

Click on the Setup Network and Security button

If Public in account is checked all the users belonging to your account will be able to see the created cluster on
the UI, but cannot delete or modify it.

Setup Network and Security tab

Select one of the security groups

Click on the Choose Blueprint button

If Enable security is checked as well, Cloudbreak will install Key Distribution Center (KDC) and the cluster will
be Kerberized. See more about it in the Kerberos section of this documentation.

Choose Blueprint tab

Select one of the blueprint

After you've selected a Blueprint, you should be able to configure:

the templates

the securitygroups

the number of nodes for all of the host groups in the blueprint

You need to select where you want to install the Ambari server to. Only 1 host group can be selected.
If you want to install the Ambari server to a separate node, you need to extend your blueprint with a new host group
which contains only 1 service: HDFS_CLIENT and select this host group for the Ambari server. Note: this host group cannot be scaled so
it is not advised to select a 'slave' host group for this purpose.

Click on the Review and Launch button

Review and Launch tab

After the create and start cluster button has clicked Cloudbreak will start to create the cluster's resources on
your AWS account.

Cloudbreak uses CloudFormation to create the resources - you can check out the resources created by Cloudbreak on
the AWS Console CloudFormation page.
Full size here.

Besides these you can check the progress on the Cloudbreak Web UI itself if you open the new cluster's Event History.
Full size here.

Advanced Options

There are some advanced features when deploying a new cluster, these are the following:

Ambari Username This user will be used as admin user in Ambari. You can log in using this username on the Ambari UI.

Ambari Password The password associated with the Ambari username. This password will be also the default password for all required passwords which are not specified in the blueprint. E.g: hive DB password.

Availability Zone You can restrict the instances to a specific availability zone. It may be useful if you're using
reserved instances.

Validate blueprint This is selected by default. Cloudbreak validates the Ambari blueprint in this case.

Custom Image If you enable this, you can override the default image for provision.

Config recommendation strategy Strategy for how configuration recommendations will be applied. Recommended
configurations gathered by the response of the stack advisor.

NEVER_APPLY Configuration recommendations are ignored with this option.

ONLY_STACK_DEFAULTS_APPLY Applies only on the default configurations for all included services.

ALWAYS_APPLY Applies on all configuration properties.

Instance Profile Cluster will be able to communicate with AWS api without any configuration.

Disable Instance Profile attaching by default Cluster will not be able to communicate with AWS api.

Create Instance Profile and attach to the instances The Cloudformation template will create a new role and assign to every instance.

Define Existing Instance Profile and attach to the instances Cluster will use the predefined instance role. You should define the role ARN in the Role for Instance Profile box.

Hostgroup Configuration During the hostgroup config we support different security groups per hostgroup.

Configure Ambari Database In case you have an existing DB (like RDS) you can reuse it

Cluster Termination

You can terminate running or stopped clusters with the terminate button in the cluster details.

IMPORTANT: Always use Cloudbreak to terminate the cluster. If that fails for some reason, try to delete the
CloudFormation stack first. Instances are started in an Auto Scaling Group so they may be restarted if you terminate an instance manually!

Sometimes Cloudbreak cannot synchronize its state with the cluster state at the cloud provider and the cluster can't
be terminated. In this case the Forced termination option can help to terminate the cluster at the Cloudbreak
side. If it has happened:

Interactive mode / Cloudbreak Shell

The goal with the Cloudbreak Shell (Cloudbreak CLI) was to provide an interactive command line tool which supports:

all functionality available through the REST API or Cloudbreak Web UI

makes possible complete automation of management task via scripts

context aware command availability

tab completion

required/optional parameter support

hint command to guide you on the usual path

Start Cloudbreak Shell

To start the Cloudbreak CLI use the following commands:

Open your cloudbreak-deployment directory if it is needed. For example:

cd cloudbreak-deployment

Start the cbd from here if it is needed

cbd start

In the root of your cloudbreak-deployment folder apply:

cbd util cloudbreak-shell

At the very first time it will take for a while, because of need to download all the necessary docker images.

This will launch the Cloudbreak shell inside a Docker container then it is ready to use.
Full size here.

IMPORTANT You have to copy all your files into the cbd working directory, what you would like to use in shell. For
example if your cbd working directory is ~/cloudbreak-deployment then copy your blueprint JSON, public ssh key
file...etc. to here. You can refer to these files with their names from the shell.

Autocomplete and Hints

Cloudbreak Shell helps you with hint messages from the very beginning, for example:

cloudbreak-shell>hint
Hint: Add a blueprint with the 'blueprint create' command or select an existing one with 'blueprint select'
cloudbreak-shell>

Cluster Provisioning via CLI

Setting up AWS Credential

Cloudbreak works by connecting your AWS account through so called Credentials, and then uses these credentials to
create resources on your behalf. Credentials can be configured with the following command for example:

NOTE: Cloudbreak does not set your cloud user details - we work around the concept of IAM - on Amazon (or other cloud providers). You should have already a valid IAM role. You can
find further details here.

Alternatives to provide SSH Key:

you can upload your public key from an url: —sshKeyUrl

or you can add the path of your public key: —sshKeyPath

You can check whether the credential was created successfully

credential list

You can switch between your existing credentials

credential select --name my-aws-credential

Infrastructure Templates

After your AWS account is linked to Cloudbreak you can start creating resource templates that describe your clusters' infrastructure:

security groups

networks

templates

When you create one of the above resources, Cloudbreak does not make any requests to AWS. Resources are only created
on AWS after the cluster create has applied. These templates are saved to Cloudbreak's database and can be
reused with multiple clusters to describe the infrastructure.

Templates

Templates describe the instances of your cluster - the instance type and the attached volumes. A typical setup is
to combine multiple templates in a cluster for the different types of nodes. For example you may want to attach multiple
large disks to the datanodes or have memory optimized instances for Spark nodes.

A template can be used repeatedly to create identical copies of the same stack (or to use as a foundation to start a
new stack). Templates can be configured with the following command for example:

Other available option here is --publicInAccount. If it is true, all the users belonging to your account will be able
to use this template to create clusters, but cannot delete it.

You can check whether the template was created successfully

template list

Networks

Your clusters can be created in their own Virtual Private Cloud (VPC) or in one of your already existing VPCs. If
you choose an existing VPC it is possible to create a new subnet within the VPC or use an already existing one. The
subnet's IP range must be defined in the Subnet (CIDR) field using the general CIDR notation.

Default AWS Network

If you don't want to create or use your custom VPC, you can use the default-aws-network for all your Cloudbreak
clusters. It will create a new VPC with a 10.0.0.0/16 subnet every time a cluster is created.

Custom AWS Network

If you'd like to deploy a cluster to a custom VPC you'll have to create a new network template, to create a new
subnet within the VPC, provide the ID of the subnet which is in the existing VPC.

A network also can be used repeatedly to create identical copies of the same stack (or to use as a foundation to
start a new stack).

IMPORTANT The subnet CIDR cannot overlap each other in a VPC. So you have to create different network templates
for every each clusters.
For example you can create 3 different clusters with 3 different network templates for multiple subnets 10.0.0.0/24,
10.0.1.0/24, 10.0.2.0/24 with the same VPC and IGW identifiers.

network create --AWS --name my-aws-network --subnet 10.0.0.0/16

Other available options:

--vpcID your existing vpc on amazon

--internetGatewayID your amazon internet gateway of the given VPC

--publicInAccount If it is true, all the users belonging to your account will be able to use this network to create
clusters, but cannot delete it.

You can check whether the network was created successfully

network list

Defining Cluster Services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones.
Blueprints can be added from file or URL (an example blueprint).

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and
components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE: It is not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will fill that with a default value.

--publicInAccount If it is true, all the users belonging to your account will be able to use this blueprint to create
clusters, but cannot delete it.

You can check whether the blueprint was created successfully

blueprint list

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight
modifications.
There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the
modifications have to be done manually.
When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations..etc. that won't be applicable to the Cloudbreak cluster.

Metadata Show

You can check the stack metadata with

stack metadata --name myawsstack --instancegroup master

Other available options:

--id In this case you can select a stack with id.

--outputType In this case you can modify the outputformat of the command (RAW or JSON).

Cluster Deployment

After all the cluster resources are configured you can deploy a new HDP cluster. The following sub-sections show you a basic flow for cluster creation with Cloudbreak Shell.

Select Credential

Select one of your previously created AWS credential:

credential select --name my-aws-credential

Select Blueprint

Select one of your previously created blueprint which fits your needs:

blueprint select --name multi-node-hdfs-yarn

Configure Instance Groups

You must configure instance groups before provisioning. An instance group define a group of nodes with a specified
template. Usually we create instance groups for host groups in the blueprint. For Ambari server only 1 host group can be specified.
If you want to install the Ambari server to a separate node, you need to extend your blueprint with a new host group
which contains only 1 service: HDFS_CLIENT and select this host group for the Ambari server. Note: this host group cannot be scaled so
it is not advised to select a 'slave' host group for this purpose.

Select Network

Select one of your previously created network which fits your needs or a default one:

network select --name default-aws-network

Create Stack / Create Cloud Infrastructure

Stack means the running cloud infrastructure that is created based on the instance groups configured earlier
(credential, instancegroups, network, securitygroup). Same as in case of the API or UI the new cluster will
use your templates and by using CloudFormation will launch your cloud stack. Use the following command to create a
stack to be used with your Hadoop cluster:

stack create --AWS --name myawsstack --region us-east-1

The infrastructure is created asynchronously, the state of the stack can be checked with the stack show command. If
it reports AVAILABLE, it means that the virtual machines and the corresponding infrastructure is running at the cloud provider.

Other available option is:

--wait - in this case the create command will return only after the process has finished.
--instanceProfileStrategy - strategy for seamless S3 connection. (CREATE, USE_EXISTING)
--instanceProfile - If you selected 'USE_EXISTING' strategy then you should define the Instance Profile role which will be assigned to instances.

Create a Hadoop Cluster / Cloud Provisioning

You are almost done! One more command and your Hadoop cluster is starting! Cloud provisioning is done once the
cluster is up and running. The new cluster will use your selected blueprint and install your custom Hadoop cluster
with the selected components and services.

cluster create --description "my first cluster"

Other available option is --wait - in this case the create command will return only after the process has finished.

You are done! You have several opportunities to check the progress during the infrastructure creation then
provisioning:

Cloudbreak uses CloudFormation to create the resources - you can check out the resources created by Cloudbreak on
the AWS Console CloudFormation page.

Cluster Termination

You can terminate running or stopped clusters with

stack delete --name myawsstack

Other available option is --wait - in this case the terminate command will return only after the process has finished.

IMPORTANT: Always use Cloudbreak to terminate the cluster. If that fails for some reason, try to delete the
CloudFormation stack first. Instances are started in an Auto Scaling Group so they may be restarted if you terminate an instance manually!

Sometimes Cloudbreak cannot synchronize its state with the cluster state at the cloud provider and the cluster can't
be terminated. In this case the Forced termination option on the Cloudbreak Web UI can help to terminate the cluster
at the Cloudbreak side. If it has happened:

You should check the related resources at the AWS CloudFormation

If it is needed you need to manually remove resources from there

Silent Mode

With Cloudbreak Shell you can execute script files as well. A script file contains shell commands and can
be executed with the script cloudbreak shell command

script <your script file>

or with the cbd util cloudbreak-shell-quiet command

cbd util cloudbreak-shell-quiet < example.sh

IMPORTANT: You have to copy all your files into the cbd working directory, what you would like to use in shell.
For example if your cbd working directory is ~/cloudbreak-deployment then copy your script file to here.

Example

The following example creates a Hadoop cluster with hdp-small-default blueprint on M4Xlarge instances with 2X100G
attached disks on default-aws-network network using all-services-port security group. You should copy your ssh
public key file into your cbd working directory with name id_rsa.pub and paste your AWS credentials in the parts with <...> highlight.