Tuesday, January 27, 2015

NCC - The Swiss Army Knife of Nutanix Troubleshooting Tools.

The Swiss Army knife is a pocket size multi-tool which equips you for everyday challenges. NCC equips you with multiple Nutanix troubleshooting tools in one package.NCC provides multiple utilities (plugins ) for the Nutanix Infrastructure administrator to

if needed, ability to execute NCC automatically and email the results at certain configurable time interval.

NCC is developed by Nutanix Engineering based on inputs provided by support engineers, customers, on-call engineers and solution architects. NCC helps the Nutanix customer to identify the problem and fix the problem or report it to Nutanix Support. NCC enables faster problem resolution by reducing the time taken to triage an issue.

When should we run NCC ?

after a new install.

Before and After any cluster activities - add node, remove node, reconfiguration and an upgrade

anytime when you are troubleshooting an issue.

As mentioned in the cluster health blog, NCC is the collector agent for cluster health.

2. Execute "ncc health_checks run_all" and monitor for messages other than PASS.

3. List of NCC Status

4. Results of a NCC check run on a lab cluster

5. Displaying and analyzing the failed tests.

FAILUREs are due to sub-optimal CVM memory and network errors. So to fix the issue- increase CVM memory to 16G or more. (KB: 1513 -https://portal.nutanix.com/#/page/kbs/details?targetId=kA0600000008djKCAQ )- check the network (rx_missed_errors -- check for network port flaps, network driver issues- KB 1679 and KB 1381)c. Log Collector Feature of NCC: ( similar to show tech_support of Cisco or vm-support of VMware)
NCC Log collector collects the logs from all the CVMs in parallel.

1. Execute ncc log_collector to find the list of logs that will be collected.

2. To collect all the logs for last 4 hours - ncc log_collector run_all

For example: stargate.INFO will have the time period when it is collected: