This troubleshooting guide provides detailed information on how to troubleshoot ClusterControl. ClusterControl is an operational management and automation software for database clusters, which aims to simplify deployment, monitoring, management and scaling of clusters.

This document is a guide to help troubleshoot problems that commonly arise with ClusterControl. In particular, this guide addresses possible problems that may originate from ClusterControl components namely CMON controller, CMON database, ClusterControl UI and ClusterControl CMONAPI. The document provides guidance on troubleshooting steps to identify the problem, with possible solutions. Finally, the document provides instructions on what data to collect when creating error reports to be submitted to Severalnines Support.

Note that this troubleshooting guide covers the latest ClusterControl version. We recommend you to stay up-to-date with the latest version of ClusterControl as it contains the latest bug fixes. To upgrade ClusterControl to the latest version, please refer to the section on Upgrading ClusterControl.

ClusterControl consists of different components which write their own logs. These files reside on the ClusterControl node. By default, CMON and ClusterControl UI run without the debug option enabled. Please refer to Reporting and Debugging section on how to get them run in debug mode.

If you encounter any problems with ClusterControl, it is highly recommended to examine the related log files:

ClusterControl provides error reporting tool called s9s_error_reporter. This can greatly facilitate the troubleshooting process as it collects the necessary information on the entire database cluster setup and archives it in a package. You can use this tool to generate error reports, and then attach the generated tar ball package to the Support Ticket.

Debugging is a program that produces a core dump. It consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. ClusterControl Controller (CMON) package comes with a cron file installed under /etc/cron.d/ which will auto-restart if the cmon process is terminated abnormally. Typically, you may notice if cmon process has crashed by looking at the dmesg output.

In such cases, generating a core dump is the only way to backtrace the issue. Make sure you have the debugging components installed as described in the previous section beforehand. On ClusterControl node as root user, increase the CPU limit, adjust kernel’s core pattern value and run CMON on foreground:

When cmon crashes there will now be a core file in /tmp. Compress the core dump (gzip is recommended) and attach it to a support ticket so we can take a look and perform necessary fix. Alternatively, you can send only the backtrace in a support ticket by using following command:

gdb /usr/sbin/cmon /tmp/<corefile>
thread apply all bt full

Attach the full output and potentially replace sensitive information with “XXXXXXXXX”. Traces may contain password information.

If you would like to run cmon as foreground process, you can do that by invoking -d option:

$ service cmon stop
$ CMON_DEBUG=1 cmon -d

CMON will enable LOG_DEBUG messages and print detailed information on the screen (stdout) as well as /var/log/cmon.log or /var/log/cmon_{clusterID}.log. Press Ctrl+C to terminate the process. In certain cases, the CMON output might be needed to get insight on the problem.

This section covers common issues when dealing with ClusterControl components, with possible troubleshooting steps and solutions. There is also a community forum available with knowledge base sections for public reference.

By default, CMON is configured to perform recovery of failed nodes or clusters. This behavior can be overridden by disabling automatic recovery feature or enabling maintenance mode for the node/cluster.

Solution:

Enabling maintenance mode for selected nodes (recommended).

To enable maintenance window, go to Nodes > select the node > toggle ON on the Maintenance Mode. You have to specify the reason and duration of maintenance window. During this period, any alarms and notifications raised for this node will be disabled. You can toggle OFF the maintenance mode at any time when the maintenance exercise is completed.

Disabling automatic recovery.

To disable automatic recovery temporarily, you can just click on the ‘power’ icon for node and cluster. Red means automatic recovery is turned off while green indicates recovery is turned on. This behavior will not persistent if CMON is restarted.

To make the above change persistent, disable node or cluster auto recovery by specifying following line inside CMON configuration file of respective cluster. For example, if you want to disable automatic recovery for cluster ID 1, inside /etc/cmon.d/cmon_1.cnf, set the following line:

It is not recommended to mix public IP address and internal IP address. For the GRANT, try to use the IP address that your database nodes use to communicate with each other.

If the SHOW STATUS returns ERROR1130(HY000):Host'[ClusterControlIPaddress]'isnotallowedtoconnecttothis, the database host is missing the cmon user grant. Run following command to reset the cmon user privileges:

The ClusterControl UI shows a toaster notification (on the top right of the UI) indicating that it has authentication problem to connect to a specific cluster ID.

Troubleshooting steps:

Run the following command to verify if token is set correctly for corresponding cluster:

mysql>SELECTcluster_id,tokenFROMdcps.clusters;

Solution:

In this case you need to update the token column in dcps.clusters table for the cluster_id={ID} so it matches the rpc_key in /etc/cmon.d/cmon_{ID}.cnf. These tokens must match. Execute the following update query on the dpcs database:

The ClusterControl UI shows a toaster notification (on the top right of the UI) indicating that it has authentication problem to connect to cluster 0 (0 means global view of clusters under ClusterControl management).

Verify that the RPC_TOKEN value in /var/www/html/clustercontrol/bootstrap.php and CMON_TOKEN value in /var/www/html/cmonapi/config/bootstrap.php match the token defined as rpc_key in /etc/cmon.cnf. If you manipulate /etc/cmon.cnf you must restart cmon for the change to take effect.

ClusterControl is not fully tested in OS-level virtualization platform (containers) like OpenVZ. This may cause some issues in reporting of host statistics since it does not use the conventional device naming and mapping.

Known issues in ClusterControl:

Running two simultaneous backups (storage on Controller) on two different clusters. One will most likely fail (due to netcat port conflict)

Running two simultaneous HAProxy install on two different clusters (different load balancer hosts), one will most likely fail.