Prerequisites

Steps

Credentials

PUBLIC_SSH_KEY - create a credential using your public SSH key as the value.

PRIVATE_SSH_KEY - create a credential using your private SSH key as the value.

Import and Clone the ServerTemplate

Import the Apache Hadoop ServerTemplate from the MultiCloud Marketplace.

Clone the ServerTemplate to make an editable copy.

Commit the revision.

(optional) If you plan to make backups of the data, you should import the Storage Toolbox.

Launch the NameNode

Use the committed, cloned ServerTemplate to add a server to your deployment.

Click launch and configure the following inputs at the deployment level:

Name

Description

Recommended Value

Hadoop node type

The type of server that is being launched, either a namenode (master) or datanode (slave).

text:namenode

Hadoop namenode dfs.replicaton property

Sets the nodename dfs.replication property. See the Hadoop documentation for more information about this property.

text:3

Namenode firewall port

The firewall port to open to for Filesystem metadata operations.

text:8020

Namenode http firewall port

The firewall port to open for namenode http connections.

text:50070

Public SSH Key

The public key installed on each datanode to allow the nodename connections. This must be the public key pair to the private key below.

cred: public_ssh_key

Private SSH Key

The private ssh key installed to allow nodename connections to the datanodes. must be the private key pair of the public key above.

cred:private_ssh_key

Block Device Inputs

If you are using a block device such as Amazon EBS enter the following inputs. Add the block_device:setup_block_device recipe below the block_device:default recipe and configure the following inputs:

Name

Description

Example value

Number of Volumes in the Stripe (1)

To use striped volumes with your databases, specify a volume quantity. The default is 1, indicating no volume striping. Ignored for clouds that do not support volume-based storage (e.g., Rackspace).

text: 1

Total Volume Size (1)

Specify the total size, in GB, of the volume or striped volume set used for primary storage. If dividing this value by the stripe volume quantity does not yield a whole number, then each volume's size is rounded up to the nearest whole integer. For example, if "Number of Volumes in the Stripe" is 3 and you specify a "Total Volume Size" of 5 GB, each volume will be 2 GB. If deploying on a CloudStack-based cloud that does not allow custom volume sizes, the smallest predefined volume size is used instead of the size specified here. This input is ignored for clouds that do not support volume storage (e.g., Rackspace). Important! The value for this input does not describe the actual amount of space that's available for data storage because a percent (default: 90%) is reserved for taking LVM snapshots. Use the 'Percentage of the LVM used for data (1)' input to control how much of the volume stripe is used for data storage. Be sure to account for additional space that will be required to accommodate the growth of your database.

text: 10

Percentage of the LVM used for data (1)

The percentage of the total Volume Group extents (LVM) that is used for data storage. The remaining percent is reserved for taking LVM snapshots. (e.g. 75 percent - 3/4 used for data storage and 1/4 remainder used for overhead and snapshots)

WARNING: If the database experiences a large amount of writes/changes, LVM snapshots may fail. In such cases, use a more conservative value for this inputs. (e.g. 50%)

text: 90%-

Click Save.

Launch the server and wait until it becomes operational before moving on to the next step.

Launch a DataNode server

Update the deployment inputs with the following DataNode-specific inputs:

Name

Description

Example value

Hadoop node type

Type of server that is being launched, either a namenode (master) or datanode (slave)

text:datanode

Datanode address firewall port

Firewall port for datanode address

text:50010

Datanode http firewall port

Firewall port for datanode http

text:50075

Datanode ipc firewall port

Firewall port for datanode ipc

text:50020

Click Save.

Click Launch.

Additional steps

To launch more DataNode servers, clone the DataNode server, name it appropriately and repeat. You can also create a server array for the datanode. Use the Min and Max servers to keep the number of DataNode servers you need in your cluster.