OpenStack Supported Versions

Cloudbreak requires that the standard components are installed and configured on OpenStack:

Keystone V2 or Keystone V3

Neutron (self-service and provider networking)

Nova (KVM or Xen hypervisor)

Glance

Cinder (optional)

Heat (optional, but it is highly recommended, since provisioning through native api calls will be deprecated in the future)

OpenStack Images

We have pre-built cloud images for OpenStack with the Cloudbreak Deployer pre-installed and with Cloudbreak
pre-installed. Following steps will guide you through the launch of the images then the needed configuration.

Alternatively, instead of using the pre-built cloud image, you can install Cloudbreak Deployer on your own VM. See
install the Cloudbreak Deployer for more information.

Please make sure you opened the following ports on your security group:

OpenStack-specific Configuration

Using Self-signed Certificates

If your OpenStack is secured with a self-signed certificate, you need to import that certificate into Cloudbreak,
or else Cloudbreak won't be able to communicate with your OpenStack. To import the certificate, place the certificate
file in the generated certs directory /certs/trusted/. The trusted directory does not exist by default, so you need to create it.
Cloudbreak will automatically pick up these certificates and import them into its truststore upon start.

Availability Zones and Region config

By default Cloudbreak uses RegionOne region with nova availability zone, but OpenStack supports multiple regions and multiple availability zones. You can customize Cloudbreak deployment and enable multiple
regions and availability zones by creating an openstack-zone.json under the etc directory of Cloudbreak deployment (e.g. /var/lib/cloudbreak-deployment/etc/openstack-zone.json).
You can find an example of openstack-zone.json containing two regions and four availability zones below:

You should see a line like this: Started CloudbreakApplication in 36.823 seconds. Cloudbreak normally takes less than a minute to start.

Provisioning Prerequisites

Generate a New SSH Key

All the instances created by Cloudbreak are configured to allow key-based SSH,
so you'll need to provide an SSH public key that can be used later to SSH onto the instances in the clusters you'll create with Cloudbreak.
You can use one of your existing keys or you can generate a new one.

# Enter file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
You'll be asked to enter a passphrase, but you can leave it empty.
# Enter passphrase (empty for no passphrase): [Type a passphrase]
# Enter same passphrase again: [Type passphrase again]

After you enter a passphrase the keypair is generated. The output should look something like below.

# Your identification has been saved in /Users/you/.ssh/id_rsa.
# Your public key has been saved in /Users/you/.ssh/id_rsa.pub.
# The key fingerprint is:
# 01:0f:f4:3b:ca:85:sd:17:sd:7d:sd:68:9d:sd:a2:sd your_email@example.com

Later you'll need to pass the .pub file's contents to Cloudbreak and use the private part to SSH to the instances

Provisioning via Browser

You can log into the Cloudbreak application at https://<Public_IP>/.

The main goal of the Cloudbreak UI is to easily create clusters on your own cloud provider account.
This description details the OpenStack setup - if you'd like to use a different cloud provider check out its manual.

This document explains the four steps that need to be followed to create Cloudbreak clusters from the UI:

connect your OpenStack account with Cloudbreak

create some template resources on the UI that describe the infrastructure of your clusters

create a blueprint that describes the HDP services in your clusters

launch the cluster itself based on these resources

IMPORTANT Make sure that you have sufficient qouta (CPU, network, etc) for the requested cluster size

Setting up OpenStack credentials

Cloudbreak works by connecting your OpenStack account through so called Credentials, and then uses these credentials
to create resources on your behalf. The credentials can be configured on the manage credentials panel on the
Cloudbreak Dashboard.

Infrastructure templates

After your OpenStack account is linked to Cloudbreak you can start creating resource templates that describe your
clusters' infrastructure:

templates

networks

security groups

When you create one of the above resources, Cloudbreak does not make any requests to OpenStack. Resources are only
created on OpenStack after the create cluster button has pushed. These templates are saved to Cloudbreak's
database and can be reused with multiple clusters to describe the infrastructure.

Templates

Templates describe the instances of your cluster - the instance type and the attached volumes. A typical setup is
to combine multiple templates in a cluster for the different types of nodes. For example you may want to attach multiple
large disks to the datanodes or have memory optimized instances for Spark nodes.

The instance templates can be configured on the manage templates panel on the Cloudbreak Dashboard.

If Public in account is checked all the users belonging to your account will be able to use this resource to create clusters, but cannot delete it.

Networks

Your clusters can be created in their own networks or in one of your already existing one. If you choose an
existing network, it is possible to create a new subnet within the network. The subnet's IP range must be defined in
the Subnet (CIDR) field using the general CIDR notation. Here you can read more about OpenStack networking.

Custom OpenStack Network

If you'd like to deploy a cluster to your OpenStack network you'll have to create a new network template on the
manage networks panel on the Cloudbreak Dashboard.

"Before launching an instance, you must create the necessary virtual network infrastructure...an instance uses a
public provider virtual network that connects to the physical network infrastructure...This network includes a DHCP
server that provides IP addresses to instances...The admin or other privileged user must create this network because
it connects directly to the physical network infrastructure."

Create a new network and a new subnet: Every time a cluster is created with this kind of network setup a new network and a new subnet with the specified IP range will be created for the instances on OpenStack.

Create a new subnet in an existing network: Use this kind of network setup if you already have a network on OpenStack where you'd like to put the Cloudbreak created cluster but you'd like to have a separate subnet for it.

Use an existing subnet in an existing network: Use this kind of network setup if you have an existing network with one or more subnets on OpenStack and you'd like to start the instances of a cluster in one of those subnets.

Explanation of the parameters:

Name the name of the new network

Length must be between 5 and 100 characters

Starts with a lowercase alphabetic character

Can contain lowercase alphanumeric and hyphens only

Subnet (CIDR) With this field you define the IP address space usable by your VMs within the cluster. Cloudbreak supports the private address space defined in RFC1918. E.g. you can use 10.0.0.0/24

Floating Pool IDFloating IPs are not automatically allocated to instances by default, they needs to be attached to instances after creation. The Floating IPs are used to provide access to your VMs running on OpenStack. You can figure out the available network pools and their IDs by using the nova floating-ip-pool-list and neutron net-external-list or copy-pasting it from OpenStack Horizon UI. Such networks have the External Network field set to Yes. If you are unable to find the ID then just consult with your OpenStack network administrator. Please note that if you do not set this field then your cluster might not be accessible.

Virtual Network Identifier This is the ID of an existing virtual network on OpenStack where you would like to launch the cluster. (Must be provided for Create a new subnet in an existing network and Use an existing subnet in an existing network)

Router Identifier Specify the router ID that shall interconnect your existing Network with the Subnet which will be created by CLoudbreak. (Must be provided for Create a new subnet in an existing network).

Subnet Identifier This is the ID of an existing subnet on OpenStack where you would like to launch the cluster. (Must be provided for Use an existing subnet in an existing network)

IMPORTANT Please make sure the defined subnet here doesn't overlap with any of your already deployed subnet in the
network, because the validation only happens after the cluster creation starts.

In case of existing subnet make sure you have enough room within your network space for the new instances. The
provided subnet CIDR will be ignored, but a proper CIDR range will be used.

If Public in account is checked all the users belonging to your account will be able to use this network template
to create clusters, but cannot delete it.

NOTE The new networks are created on OpenStack only after the the cluster provisioning starts with the selected
network template.

Security group templates are very similar to the Security Groups on OpenStack. They describe the allowed inbound traffic
to the instances in the cluster. Currently only one security group template can be selected for a Cloudbreak cluster
and all the instances have a public IP address so all the instances in the cluster will belong to the same security
group. This may change in a later release.

Default Security Group

You can also use the two pre-defined security groups in Cloudbreak.

only-ssh-and-ssl: all ports are locked down except for SSH and the selected Ambari Server HTTPS (you can't access Hadoop services
outside of the network):

SSH (22)

HTTPS (443)

Custom Security Group

You can define your own security group by adding all the ports, protocols and CIDR range you'd like to use. The rules
defined here doesn't need to contain the internal rules, those are automatically added by Cloudbreak to the security
group on OpenStack.

Defining Cluster Services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones.
Blueprints can be added from file, URL (an example blueprint) or the
whole JSON can be written in the JSON text box.

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and
components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE: It is not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will
fill that with a default value.

If Public in account is checked all the users belonging to your account will be able to use this blueprint to
create clusters, but cannot delete or modify it.

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight
modifications.
There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the
modifications have to be done manually.
When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations...etc. that won't be applicable to the Cloudbreak cluster

Cluster deployment

After all the cluster resources are configured you can deploy a new HDP cluster.

Here is a basic flow for cluster creation on Cloudbreak Web UI:

Start by selecting a previously created OpenStack credential in the header.

The name must be between 5 and 40 characters long and must satisfy the followings:

Starts with a lowercase alphabetic character

Can contain lowercase alphanumeric and hyphens only

Select one of your Region where you like your cluster be provisioned

Click on the Setup Network and Security button

If Public in account is checked all the users belonging to your account will be able to see the created cluster on
the UI, but cannot delete or modify it.

Setup Network and Security tab

Select one of your previously created networks

Click on the Choose Blueprint button

If Enable security is checked as well, Cloudbreak will install Key Distribution Center (KDC) and the cluster will
be Kerberized. See more about it in the Kerberos section of this documentation.

Choose Blueprint tab

Select one of the blueprints

After you've selected a Blueprint, you should be able to configure:

the templates

the securitygroups

the number of nodes for all of the host groups in the blueprint

You need to select where you want to install the Ambari server to. Only 1 host group can be selected.
If you want to install the Ambari server to a separate node, you need to extend your blueprint with a new host group
which contains only 1 service: HDFS_CLIENT and select this host group for the Ambari server. Note: this host group cannot be scaled so
it is not advised to select a 'slave' host group for this purpose.

Click on the Review and Launch button

Review and Launch tab

After the create and start cluster button has clicked Cloudbreak will start to create the cluster's resources on
your OpenStack account.

Cloudbreak uses OpenStack to create the resources - you can check out the resources created by Cloudbreak
on the Instances page of your OpenStack Project.
Full size here.

Besides these you can check the progress on the Cloudbreak Web UI itself if you open the new cluster's Event History.
Full size here.

Advanced options

Ambari Username This user will be used as admin user in Ambari. You can log in using this username on the Ambari UI.

Ambari Password The password associated with the Ambari username. This password will be also the default password for all required passwords which are not specified in the blueprint. E.g: hive DB password.

The HEAT variant utilizes the Heat templating to launch a stack, but the NATIVE variant starts the cluster
by using a sequence of API calls without Heat to achieve the same result, although both of them are using the same
authentication and credential management.

Validate blueprint This is selected by default. Cloudbreak validates the Ambari blueprint in this case.

Custom Image If you enable this, you can override the default image for provision.

Config recommendation strategy Strategy for how configuration recommendations will be applied. Recommended
configurations gathered by the response of the stack advisor.

NEVER_APPLY Configuration recommendations are ignored with this option.

ONLY_STACK_DEFAULTS_APPLY Applies only on the default configurations for all included services.

ALWAYS_APPLY Applies on all configuration properties.

Cluster termination

You can terminate running or stopped clusters with the terminate button in the cluster details.

IMPORTANT Always use Cloudbreak to terminate the cluster. If that fails for some reason, try to delete the
OpenStack instances first. Instances are started in an Auto Scaling Group so they may be restarted if you terminate an
instance manually!

Sometimes Cloudbreak cannot synchronize its state with the cluster state at the cloud provider and the cluster can't
be terminated. In this case the Forced termination option can help to terminate the cluster at the Cloudbreak
side. If it has happened:

Interactive mode / Cloudbreak Shell

The goal with the Cloudbreak Shell (Cloudbreak CLI) was to provide an interactive command line tool which supports:

all functionality available through the REST API or Cloudbreak Web UI

makes possible complete automation of management task via scripts

context aware command availability

tab completion

required/optional parameter support

hint command to guide you on the usual path

Start Cloudbreak Shell

To start the Cloudbreak CLI use the following commands:

Open your cloudbreak-deployment directory if it is needed. For example:

cd cloudbreak-deployment

Start the cbd from here if it is needed

cbd start

In the root of your cloudbreak-deployment folder apply:

cbd util cloudbreak-shell

At the very first time it will take for a while, because of need to download all the necessary docker images.

This will launch the Cloudbreak shell inside a Docker container then it is ready to use.
Full size here.

IMPORTANT You have to copy all your files into the cbd working directory, what you would like to use in shell. For
example if your cbd working directory is ~/cloudbreak-deployment then copy your blueprint JSON, public ssh key
file...etc. to here. You can refer to these files with their names from the shell.

Autocomplete and Hints

Cloudbreak Shell helps you with hint messages from the very beginning, for example:

cloudbreak-shell>hint
Hint: Add a blueprint with the 'blueprint create' command or select an existing one with 'blueprint select'
cloudbreak-shell>

Provisioning via CLI

Setting up OpenStack credential

Cloudbreak works by connecting your OpenStack account through so called Credentials, and then uses these credentials to
create resources on your behalf. Credentials can be configured with the following command for example:

NOTE that Cloudbreak does not set your cloud user details - we work around the concept of OpenStack's
authentication. You should have already valid OpenStack credentials. You can
find further details here.

Infrastructure templates

After your OpenStack account is linked to Cloudbreak you can start creating resource templates that describe your
clusters' infrastructure:

security groups

networks

templates

When you create one of the above resources, Cloudbreak does not make any requests to OpenStack. Resources are only
created on OpenStack after the cluster create has applied. These templates are saved to Cloudbreak's database and
can be reused with multiple clusters to describe the infrastructure.

Templates

Templates describe the instances of your cluster - the instance type and the attached volumes. A typical setup is
to combine multiple templates in a cluster for the different types of nodes. For example you may want to attach multiple
large disks to the datanodes or have memory optimized instances for Spark nodes.

A template can be used repeatedly to create identical copies of the same stack (or to use as a foundation to start a
new stack). Templates can be configured with the following command for example:

Other available option here is --publicInAccount. If it is true, all the users belonging to your account will be able
to use this template to create clusters, but cannot delete it.

You can check whether the template was created successfully

template list

Networks

Your clusters can be created in their own networks or in one of your already existing one. If you choose an
existing network, it is possible to create a new subnet within the network. The subnet's IP range must be defined in
the Subnet (CIDR) field using the general CIDR notation. Here you can read more about OpenStack networking.

Custom OpenStack Network

If you'd like to deploy a cluster to your OpenStack network you'll have to create a new network template.

A network also can be used repeatedly to create identical copies of the same stack (or to use as a foundation to
start a new stack).

"Before launching an instance, you must create the necessary virtual network infrastructure...an instance uses a
public provider virtual network that connects to the physical network infrastructure...This network includes a DHCP
server that provides IP addresses to instances...The admin or other privileged user must create this network because
it connects directly to the physical network infrastructure."

--subnetId Your subnet ID within your virtual network. If the identifier is provided, the Subnet
(CIDR) will be ignored. Leave it blank if you'd like to create a new subnet within the virtual network with the
provided Subnet (CIDR) range.

--publicInAccount If it is true, all the users belonging to your account will be able to use this template to create clusters, but cannot delete it.

You can check whether the network was created successfully

network list

Defining Cluster Services

Blueprints

Blueprints are your declarative definition of a Hadoop cluster. These are the same blueprints that are used by Ambari.

You can use the 3 default blueprints pre-defined in Cloudbreak or you can create your own ones.
Blueprints can be added from file or URL (an example blueprint).

The host groups in the JSON will be mapped to a set of instances when starting the cluster. Besides this the services and
components will also be installed on the corresponding nodes. Blueprints can be modified later from the Ambari UI.

NOTE: It is not necessary to define all the configuration in the blueprint. If a configuration is missing, Ambari will fill that with a default value.

--publicInAccount If it is true, all the users belonging to your account will be able to use this blueprint to create
clusters, but cannot delete it.

You can check whether the blueprint was created successfully

blueprint list

A blueprint can be exported from a running Ambari cluster that can be reused in Cloudbreak with slight
modifications.
There is no automatic way to modify an exported blueprint and make it instantly usable in Cloudbreak, the
modifications have to be done manually.
When the blueprint is exported some configurations are hardcoded for example domain names, memory configurations..etc. that won't be applicable to the Cloudbreak cluster.

Metadata Show

You can check the stack metadata with

stack metadata --name myawsstack --instancegroup master

Other available options:

--id In this case you can select a stack with id.

--outputType In this case you can modify the outputformat of the command (RAW or JSON).

Cluster deployment

After all the cluster resources are configured you can deploy a new HDP cluster. The following sub-sections show
you a basic flow for cluster creation with Cloudbreak Shell.

Select credential

Select one of your previously created OpenStack credential:

credential select --name my-os-credential

Select blueprint

Select one of your previously created blueprint which fits your needs:

blueprint select --name multi-node-hdfs-yarn

Configure instance groups

You must configure instance groups before provisioning. An instance group define a group of nodes with a specified
template and security group. Usually we create instance groups for host groups in the blueprint. For Ambari server only 1 host group can be specified.
If you want to install the Ambari server to a separate node, you need to extend your blueprint with a new host group
which contains only 1 service: HDFS_CLIENT and select this host group for the Ambari server. Note: this host group cannot be scaled so
it is not advised to select a 'slave' host group for this purpose.

Select one of your previously created network which fits your needs or a default one:

network select --name my-os-network

Create stack / Create cloud infrastructure

Stack means the running cloud infrastructure that is created based on the instance groups configured earlier
(credential, instancegroups, network, securitygroup). Same as in case of the API or UI the new cluster will
use your templates and by using OpenStack will launch your cloud stack. Use the following command to create a
stack to be used with your Hadoop cluster:

stack create --OPENSTACK --name myosstack --region local

The infrastructure is created asynchronously, the state of the stack can be checked with the stack show command. If
it reports AVAILABLE, it means that the virtual machines and the corresponding infrastructure is running at the cloud provider.

Other available option is:

--wait - in this case the create command will return only after the process has finished.

Create a Hadoop cluster / Cloud provisioning

You are almost done! One more command and your Hadoop cluster is starting! Cloud provisioning is done once the
cluster is up and running. The new cluster will use your selected blueprint and install your custom Hadoop cluster
with the selected components and services.

cluster create --description "my first cluster"

Other available option is --wait - in this case the create command will return only after the process has finished.

You are done! You have several opportunities to check the progress during the infrastructure creation then
provisioning:

Cloudbreak uses OpenStack to create the resources - you can check out the resources created by Cloudbreak on
the OpenStack Console Instances page.

Cluster Termination

You can terminate running or stopped clusters with

stack delete --name myawsstack

Other available option is --wait - in this case the terminate command will return only after the process has finished.

IMPORTANT: Always use Cloudbreak to terminate the cluster. If that fails for some reason, try to delete the
CloudFormation stack first. Instances are started in an Auto Scaling Group so they may be restarted if you terminate an instance manually!

Sometimes Cloudbreak cannot synchronize its state with the cluster state at the cloud provider and the cluster can't
be terminated. In this case the Forced termination option on the Cloudbreak Web UI can help to terminate the cluster
at the Cloudbreak side. If it has happened:

You should check the related resources at the AWS CloudFormation

If it is needed you need to manually remove resources from there

Silent Mode

With Cloudbreak Shell you can execute script files as well. A script file contains shell commands and can
be executed with the script cloudbreak shell command

script <your script file>

or with the cbd util cloudbreak-shell-quiet command

cbd util cloudbreak-shell-quiet < example.sh

IMPORTANT: You have to copy all your files into the cbd working directory, what you would like to use in shell.
For example if your cbd working directory is ~/cloudbreak-deployment then copy your script file to here.

Example

The following example creates a hadoop cluster with hdp-small-default blueprint on m1.large instances with 2X100G
attached disks on osnetwork network using all-services-port security group. You should copy your ssh public key
file into your cbd working directory with name id_rsa.pub and change the <...> parts with your OpenStack
credential and network details.