Either do not install swap partition or set vm.swappiness to 0 in sysctl.conf

Set the noatime flag for the partitions

Use ext3 or ext4 as the filesystem type

Disable root reserved amount

Using the text installer, your partitions are set up automatically. It will install a swap partition, and a separate one for the boot loader.

You'll only get an option to partition the drives through the GUI install. So in these cases of a text one, it'll auto-format, use LVM, and create an ext4 filesystem for root.

Set vm.swappiness to 0 in /etc/sysctl.conf, and apply it to the running system. This will let the kernel use swap only if something is going to OOM.

DHCP request

If you didn't do the netinstall, then your server might not get a DHCP address when booting up the first time. First, get a DHCP address for your existing install, assuming your network device is eth0:

Edit /etc/sysconfig/network-scripts/ifcfg-eth0 so it will run it on boot:

ONBOOT=yes

Disable iptables

Unless needed, disable iptables per HortonWork's recommendation:

chkconfig iptables off
chkconfig ip6tables off

NTP

It's best to have a Hadoop node in sync with an NTP server so that there is no drift between each server.

chkconfig ntp on
chkconfig ntpdate on

Max open files and processes

Set the ulimit values for all users on the system. Hadoop will need this since it opens a lot of files and creates a lot of processes. There will be performance impact with the general defaults of 1024.

In /etc/security/limits.conf:

* - nofile 32768
* - nproc 65536

Hostnames

Again, to improve performance for Hadoop, set DNS entries for nodes directly in the /etc/hosts. This saves DNS lookups for the servers.

Set the hostname on boot for CentOS. Add this to /etc/sysconfig/network:

HOSTNAME=hadoop-node1

Hadoop also recommends disabling IPv6:

NETWORKING_IPV6=no

Setup SSH pubkeys

For each server, set up an SSH public key without a passphrase for root. Ambari will use it to communicate with the other servers and install packages.

ssh-keygen

SELinux

Depending on your install, SELinux may or may not be enabled.

Disable it in the running instance:

setenforce 0

And also disable it when booting in /etc/selinux/config:

SELINUX=disabled

Note that if you disable it in your running state, and install Ambari and run ambari-server setup, it will think that SE Linux is still enabled. Best to reboot, then, after everything else is complete.

Disable transparent hugepages

HortonWorks recommends disabling this memory setting since it may cause problems with network lookups.

Disable it in the running system, and also add to /etc/rc.local so it's preserved on boot.

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Primary node pubkeys

The primary node that has Ambari installed will need it's pubkey installed on all the nodes including itself.