While setting up my first Hadoop cluster I faced with the dilemma of how to perform installations of CentOS 7 on multiple servers at once. If you have 20 data nodes to deploy, anything you chose to automate an installation will greatly reduce the deployment time, but most importantly, it will eliminate the possibility of human error (typo for example).

Initially, I started looking at the disk cloning direction. Since all my data nodes are identical, I was thinking to prepare one data node server, then dd the system drive, place it on a NFS share, boot the server and re-image the system drive using dd image from the share. Clonezilla and DRBL seem to be the perfect pair for a such scenario. And although you will spend some time configuring, testing and tuning it, it was still worth to look into it.

Then I realized that even if I manage to establish the setup above, I’ll still have to deal with manual post-installation tweaks, like regeneration of SSH keys and probably adjusting of MAC addresses. On top of that, to transfer raw dd image (in my case it was ~30GB) might take longer than initial installation itself. Therefore I ended up using Kickstart method. I’m pretty sure there are more efficient solutions and if you happen to know one I’d love to hear your comments.