blog about anything technical…

I recently deployed HortonWorks on a small lab environment. 4 virtual machines were used on a Intel Nuc barebone with 16gb memory. I saved the history of the command line and made several screenshots. In case you want to deploy HortonWorks yourself, this might be useful.

For my VMWare ESXi I had to install the open-vm-tools, see below.

sudo apt-get install open-vm-tools

The installation requires a root account or an account with enough privileges. I used the root account option. For Ubuntu you need to enable this first:

sudo nano /etc/ssh/sshd_config
PermitRootLogin yes

After changing the sshd_config, restart ssh and change the password:

sudo service ssh restart
sudo passwd root

Next thing I have done is that I added all the servers to the hosts file on the first machine I wanted to use for installation. You can use DNS for this or the hosts file. In my small lab environment I went for the hosts file.

Accept the Oracle JDK license when prompted. You must accept this license to download the necessary JDK from Oracle. The JDK is installed during the deploy phase.

Select n at Enter advanced database configuration to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The default user name and password are ambari/bigdata. Otherwise, to use an existing PostgreSQL, MySQL or Oracle database with Ambari -> y

Now you are ready to start the server. Use the following command:

ambari-server start

Navigate to the 8080 port on the ambari server. In my case I used http://192.168.0.162:8080/

Use the admin/admin combination, see below:

Next step is that we want to launch the install wizard, use this button.

Give the cluster a name. In my case I used the name “hadoop”.

Select the distribution version. I used the latest HDP 2.4

Select the nodes you want to install on. I used all four nodes, see below. For the communication I had to copy paste the ssh key into this screen. Use the command below and copy paste the entire ssh key into this field:

Next thing is to assign the masters. The first node I used as namenode, zookeeper, atlas and grafana. For the next snamenode, history, app timeline server, etc. You can divide them all to one server, but at least make sure you have enough memory.

Next step is to assign the slaves and clients. I made all hosts a data node and node manager. You might to make an exception for the first node.

Next step is for hive to create a new mysql database. This is where all the management information will be stored on.

I had to type a password for grafana in order to complete my installation:

Review and finish the installation:

All the packages will be deployed:

After the installation the admin user was not able to connect to the HDFS client. In order to do so, switch to the hdfs system account user.

mapred.tasktracker.map.tasks.maximum = The maximum number of map tasks that will be run simultaneously by a task tracker.mapred.tasktracker.reduce.tasks.maximum = The maximum number of reduce tasks that will be run simultaneously by a task tracker.mapred.reduce.tasks = The default number of reduce tasks per job.mapred.map.tasks = The default number of map tasks per job. Ignored when mapred.job.tracker is “local”.