18.1.3 MySQL Cluster Hardware, Software, and Networking Requirements

One of the strengths of MySQL Cluster is that it can be run on
commodity hardware and has no unusual requirements in this regard,
other than for large amounts of RAM, due to the fact that all live
data storage is done in memory. (It is possible to reduce this
requirement using Disk Data tables—see
Section 18.5.12, “MySQL Cluster Disk Data Tables”, for more information
about these.) Naturally, multiple and faster CPUs can enhance
performance. Memory requirements for other MySQL Cluster processes
are relatively small.

The software requirements for MySQL Cluster are also modest. Host
operating systems do not require any unusual modules, services,
applications, or configuration to support MySQL Cluster. For
supported operating systems, a standard installation should be
sufficient. The MySQL software requirements are simple: all that
is needed is a production release of MySQL Cluster. It is not
strictly necessary to compile MySQL yourself merely to be able to
use MySQL Cluster. We assume that you are using the binaries
appropriate to your platform, available from the MySQL Cluster
software downloads page at
http://dev.mysql.com/downloads/cluster/.

For communication between nodes, MySQL Cluster supports TCP/IP
networking in any standard topology, and the minimum expected for
each host is a standard 100 Mbps Ethernet card, plus a switch,
hub, or router to provide network connectivity for the cluster as
a whole. We strongly recommend that a MySQL Cluster be run on its
own subnet which is not shared with machines not forming part of
the cluster for the following reasons:

Security.
Communications between MySQL Cluster nodes are not encrypted
or shielded in any way. The only means of protecting
transmissions within a MySQL Cluster is to run your MySQL
Cluster on a protected network. If you intend to use MySQL
Cluster for Web applications, the cluster should definitely
reside behind your firewall and not in your network's
De-Militarized Zone
(DMZ)
or elsewhere.

Efficiency.
Setting up a MySQL Cluster on a private or protected network
enables the cluster to make exclusive use of bandwidth
between cluster hosts. Using a separate switch for your
MySQL Cluster not only helps protect against unauthorized
access to MySQL Cluster data, it also ensures that MySQL
Cluster nodes are shielded from interference caused by
transmissions between other computers on the network. For
enhanced reliability, you can use dual switches and dual
cards to remove the network as a single point of failure;
many device drivers support failover for such communication
links.

Network communication and latency.
MySQL Cluster requires communication between data nodes and API
nodes (including SQL nodes), as well as between data nodes and
other data nodes, to execute queries and updates. Communication
latency between these processes can directly affect the observed
performance and latency of user queries. In addition, to
maintain consistency and service despite the silent failure of
nodes, MySQL Cluster uses heartbeating and timeout mechanisms
which treat an extended loss of communication from a node as
node failure. This can lead to reduced redundancy. Recall that,
to maintain data consistency, a MySQL Cluster shuts down when
the last node in a node group fails. Thus, to avoid increasing
the risk of a forced shutdown, breaks in communication between
nodes should be avoided wherever possible.

The failure of a data or API node results in the abort of all
uncommitted transactions involving the failed node. Data node
recovery requires synchronization of the failed node's data
from a surviving data node, and re-establishment of disk-based
redo and checkpoint logs, before the data node returns to service.
This recovery can take some time, during which the Cluster
operates with reduced redundancy.

Heartbeating relies on timely generation of heartbeat signals by
all nodes. This may not be possible if the node is overloaded, has
insufficient machine CPU due to sharing with other programs, or is
experiencing delays due to swapping. If heartbeat generation is
sufficiently delayed, other nodes treat the node that is slow to
respond as failed.

This treatment of a slow node as a failed one may or may not be
desirable in some circumstances, depending on the impact of the
node's slowed operation on the rest of the cluster. When
setting timeout values such as
HeartbeatIntervalDbDb and
HeartbeatIntervalDbApi for
MySQL Cluster, care must be taken care to achieve quick detection,
failover, and return to service, while avoiding potentially
expensive false positives.

Where communication latencies between data nodes are expected to
be higher than would be expected in a LAN environment (on the
order of 100 µs), timeout parameters must be increased to
ensure that any allowed periods of latency periods are well within
configured timeouts. Increasing timeouts in this way has a
corresponding effect on the worst-case time to detect failure and
therefore time to service recovery.

LAN environments can typically be configured with stable low
latency, and such that they can provide redundancy with fast
failover. Individual link failures can be recovered from with
minimal and controlled latency visible at the TCP level (where
MySQL Cluster normally operates). WAN environments may offer a
range of latencies, as well as redundancy with slower failover
times. Individual link failures may require route changes to
propagate before end-to-end connectivity is restored. At the TCP
level this can appear as large latencies on individual channels.
The worst-case observed TCP latency in these scenarios is related
to the worst-case time for the IP layer to reroute around the
failures.