Tag Archive for restart

MySQL Cluster 6.3.28b and 7.0.9b contain optmizations which can greatly reduce the time taken for data nodes to restart – this includes restarting a single node, performing a rolling restart or a full system restart.

The headline figure from my results is that for a 6 Gbyte database, with modest traffic I saw a 2.2x improvement. This is using very simple transactions and so you may get a much better improvement – 70x is being seen by some!!! The best news is that the slower your current restarts, the higher the benefit you can expect to see from the optimizations.

What follows is a reworking of that original post.

MySQL Cluster Data Node restart times

Restarts are required for certain, infrequent maintenance activities. Note that there is no loss of service while a single node restarts.

When a data node restarts, it first attempts to load the data into memory from the local log files and then it will catch up with any subsequent changes by retrieveing them from the surviving node(s) in its node group.

Based on this, you would expect the time taken to restart a data node to be influenced by:

The amount of data that was stored on the data node before the restart

Rate of updates being made to the data during the restart

Network performance (assuming the data is being updated during recovery)

The times will also be influenced bycertain configuration parameters, performance of the host machine and whether the multi-threaded data node (ndbmtd) is being used.

To provide some insight into how these factors impact restart times, tests have been performed where the following factors are varied:

Database size (Each Gbyte is made up of 1,000,000 tuples in each of 5 tables)

Whether traffic is running or not (a single thread using the NDB API to send in up to 2K tps (10K updates/second))

Whether the 2 data nodes in the node group are on the same host or separated by a Gbit Ethernet network

The following factors are kept constant:

Physical hosts: Intel Core 2 Quad Q8200@2.33 GHz; 7.7 GBytes RAM

NoOfFragmentLogFiles: 300

MaxNoOfExecutionThreads=4

The optimizations introduced in MySQL Cluster 6.3.28a and 7.0.9a have reduced these times – especially when write transactions are running before, during and after the node restart is triggered.

Here are the observed results:

Improved Data Node Restart Times

For comparrison purposes, these are the results before the optimizations were introduced:

Old Data Node restart times

There are a couple of things to note from these results:

The optmizations greatly improve the results when update traffic is running

Using the multi-threaded data node (ndbmtd) greatly improves the restart time (in this case, 4 threads were available, improvements could be even greater on an 8 core/thread system)

Results become less predictable when heavy update traffic is being processed (in this case, up to 10,000 updated rows/second on a single node group). In the tests, no attempt was made to regulate this traffic and the test application was run on the same host as the one of the data nodes. Changes to the rate of updates will vary how long it takes for the restarting node to catch-up as it’s a moving target.

There is another recovery/restart scenario. The measurements shown above assumed that the file system on the data node’s host was intact and could be used to recover the in-memory copy – if that were not the case (or the data nodes were restarted with the “initial” option) then all of the data would have to be recovered from the surviving data node(s) in the same node group. As a comparison restarting a 6 Gbyte data node with the “initial” option took 20 minutes compared to 8.5 minutes without it (ndbmtd, over Gbit n/w).

Restarts are required for certain, infrequent maintenance activities. Note that there is no loss of service while a single node restarts.

When a data node restarts, it first attempts to load the data into memory from the local log files and then it will catch up with any subsequent changes by retrieveing them from the surviving node(s) in its node group.

Based on this, you would expect the time taken to restart a data node to be influenced by:

The amount of data that was stored on the data node before the restart

Rate of updates being made to the data during the restart

Network performance (assuming the data is being updated during recovery)

The times will also be influenced bycertain configuration parameters, performance of the host machine and whether the multi-threaded data node (ndbmtd) is being used.

To provide some insight into how these factors impact restart times, tests have been performed where the following factors are varied:

Database size (Each Gbyte is made up of 1,000,000 tuples in each of 5 tables)

Whether traffic is running or not (a single thread using the NDB API to send in up to 2K tps (10K updates/second))

Whether the 2 data nodes in the node group are on the same host or separated by a Gbit Ethernet network

The following factors are kept constant:

Physical hosts: Intel Core 2 Quad Q8200@2.33 GHz; 7.7 GBytes RAM

NoOfFragmentLogFiles: 300

MaxNoOfExecutionThreads=4

Here are the observed results:

Data Node restart times

There are a couple of things to note from these results:

Using the multi-threaded data node (ndbmtd) greatly improves the restart time (in this case, 4 threads were available, improvements could be even greater on an 8 core/thread system)

Results become less predictable when heavy update traffic is being processed (in this case, up to 10,000 updated rows/second on a single node group). In the tests, no attempt was made to regulate this traffic and the test application was run on the same host as the one of the data nodes. Changes to the rate of updates will vary how long it takes for the restarting node to catch-up as it’s a moving target.

There is another recovery/restart scenario. The measurements shown above assumed that the file system on the data node’s host was intact and could be used to recover the in-memory copy – if that were not the case (or the data nodes were restarted with the “initial” option) then all of the data would have to be recovered from the surviving data node(s) in the same node group. As a comparison restarting a 6 Gbyte data node with the “initial” option took 20 minutes compared to 8 minutes without it (ndbmtd, over Gbit n/w).