Troubleshoot NSX Controller cluster status, roles and connectivity

VMware’s recommendation is to deploy NSX controllers in odd numbers, three or greater. I have also heard, from my recent NSX ICM class that in most environments, three controllers should be sufficient, there isn’t need for much more. However, I wouldn’t recommend anything less than three because otherwise you won’t have any redundancy, and the NSX Controllers are the control plane for your NSX network, so it’s important to ensure redundancy. The reason you need three and not two, is because there needs to be a majority election of NSX controllers, with two you can run into a split brain scenario.

Troubleshooting NSX Controller connectivity isn’t too difficult. In fact, if you have any issues, you can always delete the rogue controller, and deploy a new one, which is probably the easiest, quickest method, or you can attempt to repair the troublesome controller. The first thing you want to do is confirm the status in the GUI by navigating to Networking & Security -> Installation -> Management. You should see a Status next to each of the NSX controller nodes.

As you can see, one of my controllers is showing Disconnected. The first thing we want to try, is to find the master node. The easiest way to do this, is to run the following commands.