Generally then you'd have the SSH keys added as part of a cluster node provisioning process, and you'd likely manage the machines using something like Puppet anyways. Having SSH keys and SSH auth available node-to-node is great on a small cluster (easy to manage/work on), but once you start hitting 1000s of nodes it's unlikely you're going to be SSH'd into node 990 and need to ssh over to node 800 without a password.

Another option, which can have some security implications, would be to copy the entire contents of the /home/mapr/.ssh/ directory (public and private keys, along with the authorized_keys file) to each node. This will basically allow every mapr user on every server to SSH around the cluster. The trouble with this is that if the key becomes compromised, you must remove it from every single node instead of just removing the key that was compromised.

You can do that, however the same security caveats still apply if you're using the same private key on all servers. You likely could script the generation of key pairs and the distribution to all of the nodes, especially if you keep a "master" authorized_keys file in MapR-FS and have each node add its key to that file and also have each node copy that file into the mapr user's SSH directory.

I am confused with Apache Hadoop Name Node Federation. Actually MapR don't provide federation. In turn it will provide high availability in M5 and M7 Editions.

The MapR Converged Data Platform provides High Availability for the Hadoop components in the stack. MapR clusters don’t use NameNodes and provide stateful high-availability for the MapReduce JobTracker and Direct Access NFS. Works out of the box with no special configuration required.