Which versions of the MySQL software support NDB Cluster? Do I
have to compile from source?

NDB Cluster is not supported in standard MySQL Server
8.0 releases. Instead, MySQL NDB Cluster is
provided as a separate product. Available NDB Cluster release
series include the following:

NDB Cluster 7.2.
This series is no longer supported for new deployments or
maintained. Users of NDB Cluster 7.2 should upgrade to a
newer release series as soon as possible. We recommend
that new deployments use the latest NDB Cluster 8.0
release.

NDB Cluster 7.3.
This series is a previous General Availability (GA)
version of NDB Cluster, still available for production
use, although we recommend that new deployments use the
latest NDB Cluster 8.0 release. The most recent NDB
Cluster 7.3 release can be obtained from
https://dev.mysql.com/downloads/cluster/.

NDB Cluster 7.4.
This series is a previous General Availability (GA)
version of NDB Cluster, still available for production
use, although we recommend that new deployments use the
latest NDB Cluster 8.0 release. The most recent NDB
Cluster 7.4 release can be obtained from
https://dev.mysql.com/downloads/cluster/.

NDB Cluster 7.5.
This series is a previous General Availability (GA)
version of NDB Cluster, still available for production
use, although we recommend that new deployments use the
latest NDB Cluster 7.6 release. The latest NDB Cluster 7.5
releases can be obtained from
https://dev.mysql.com/downloads/cluster/.

NDB Cluster 7.6.
This series is a previous General Availability (GA)
version of NDB Cluster, still available for production
use, although we recommend that new deployments use the
latest NDB Cluster 8.0 release. The latest NDB Cluster 7.6
releases can be obtained from
https://dev.mysql.com/downloads/cluster/.

NDB Cluster 8.0.
This series is the most recent General Availability (GA)
version of NDB Cluster, based on version 8.0 of the
NDB storage engine and MySQL
Server 8.0. NDB Cluster 8.0 is available for production
use; new deployments intended for production should use
the latest GA release in this series, which is currently
NDB Cluster 8.0.21. You can obtain the
most recent NDB Cluster 8.0 release from
https://dev.mysql.com/downloads/cluster/. For
information about new features and other important changes
in this series, see
Section 22.1.4, “What is New in NDB Cluster”.

Installation packages may also be available from your
platform's package management system.

You can determine whether your MySQL Server has
NDB support using one of the
statements SHOW VARIABLES LIKE 'have_%',
SHOW ENGINES, or
SHOW PLUGINS.

A.10.2.

What do “NDB” and “NDBCLUSTER” mean?

“NDB” stands for
“Network
Database”.
NDB and NDBCLUSTER are
both names for the storage engine that enables clustering
support with MySQL. NDB is preferred, but
either name is correct.

A.10.3.

What is the difference between using NDB Cluster versus using
MySQL Replication?

In traditional MySQL replication, a master MySQL server updates
one or more slaves. Transactions are committed sequentially, and
a slow transaction can cause the slave to lag behind the master.
This means that if the master fails, it is possible that the
slave might not have recorded the last few transactions. If a
transaction-safe engine such as
InnoDB is being used, a transaction
will either be complete on the slave or not applied at all, but
replication does not guarantee that all data on the master and
the slave will be consistent at all times. In NDB Cluster, all
data nodes are kept in synchrony, and a transaction committed by
any one data node is committed for all data nodes. In the event
of a data node failure, all remaining data nodes remain in a
consistent state.

Asynchronous replication is also available in NDB Cluster.
NDB Cluster Replication
(also sometimes known as “geo-replication”)
includes the capability to replicate both between two NDB
Clusters, and from an NDB Cluster to a non-Cluster MySQL server.
See Section 22.6, “NDB Cluster Replication”.

A.10.4.

Do I need any special networking to run NDB Cluster? How do
computers in a cluster communicate?

NDB Cluster is intended to be used in a high-bandwidth
environment, with computers connecting using TCP/IP. Its
performance depends directly upon the connection speed between
the cluster's computers. The minimum connectivity
requirements for NDB Cluster include a typical 100-megabit
Ethernet network or the equivalent. We recommend you use gigabit
Ethernet whenever available.

A.10.5.

How many computers do I need to run an NDB Cluster, and why?

A minimum of three computers is required to run a viable
cluster. However, the minimum recommended
number of computers in an NDB Cluster is four: one each to run
the management and SQL nodes, and two computers to serve as data
nodes. The purpose of the two data nodes is to provide
redundancy; the management node must run on a separate machine
to guarantee continued arbitration services in the event that
one of the data nodes fails.

To provide increased throughput and high availability, you
should use multiple SQL nodes (MySQL Servers connected to the
cluster). It is also possible (although not strictly necessary)
to run multiple management servers.

A.10.6.

What do the different computers do in an NDB Cluster?

An NDB Cluster has both a physical and logical organization,
with computers being the physical elements. The logical or
functional elements of a cluster are referred to as
nodes, and a computer
housing a cluster node is sometimes referred to as a
cluster host. There are
three types of nodes, each corresponding to a specific role
within the cluster. These are:

SQL node.
This is simply an instance of MySQL Server
(mysqld) that is built with support for
the NDBCLUSTER storage engine
and started with the --ndb-cluster option
to enable the engine and the
--ndb-connectstring option to enable it
to connect to an NDB Cluster management server. For more
about these options, see
Section 22.3.3.9.1, “MySQL Server Options for NDB Cluster”.

Note

An API node is any
application that makes direct use of Cluster data nodes
for data storage and retrieval. An SQL node can thus be
considered a type of API node that uses a MySQL Server to
provide an SQL interface to the Cluster. You can write
such applications (that do not depend on a MySQL Server)
using the NDB API, which supplies a direct,
object-oriented transaction and scanning interface to NDB
Cluster data; see NDB Cluster API Overview: The NDB API, for
more information.

A.10.7.

When I run the SHOW command in the NDB
Cluster management client, I see a line of output that looks
like this:

id=2 @10.100.10.32 (Version: 8.0.21-ndb-8.0.21 Nodegroup: 0, *)

What does the * mean? How is this node
different from the others?

The simplest answer is, “It's not something you can
control, and it's nothing that you need to worry about in
any case, unless you're a software engineer writing or
analyzing the NDB Cluster source code”.

If you don't find that answer satisfactory, here's a
longer and more technical version:

A number of mechanisms in NDB Cluster require distributed
coordination among the data nodes. These distributed algorithms
and protocols include global checkpointing, DDL (schema)
changes, and node restart handling. To make this coordination
simpler, the data nodes “elect” one of their number
to act as leader. (This node was once referred to as a
“master”, but this terminology was dropped to avoid
confusion with master server in MySQL Replication.) There is no
user-facing mechanism for influencing this selection, which is
completely automatic; the fact that it is
automatic is a key part of NDB Cluster's internal
architecture.

When a node acts as the “leader” for any of these
mechanisms, it is usually the point of coordination for the
activity, and the other nodes act as “followers”,
carrying out their parts of the activity as directed by the
leader. If the node acting as leader fails, then the remaining
nodes elect a new leader. Tasks in progress that were being
coordinated by the old leader may either fail or be continued by
the new leader, depending on the actual mechanism involved.

It is possible for some of these different mechanisms and
protocols to have different leader nodes, but in general the
same leader is chosen for all of them. The node indicated as the
leader in the output of SHOW in the
management client is known internally as the
DICT manager (see
The DBDICT Block, in the
NDB Cluster API Developer Guide, for more
information), responsible for coordinating DDL and metadata
activity.

NDB Cluster is designed in such a way that the choice of leader
has no discernible effect outside the cluster itself. For
example, the current leader does not have significantly higher
CPU or resource usage than the other data nodes, and failure of
the leader should not have a significantly different impact on
the cluster than the failure of any other data node.

A.10.8.

With which operating systems can I use NDB Cluster?

NDB Cluster is supported on most Unix-like operating systems.
NDB Cluster is also supported in production settings on
Microsoft Windows operating systems.

NDB Cluster should run on any platform for which
NDB-enabled binaries are available.
For data nodes and API nodes, faster CPUs and more memory are
likely to improve performance, and 64-bit CPUs are likely to be
more effective than 32-bit processors. There must be sufficient
memory on machines used for data nodes to hold each node's share
of the database (see How much RAM do I
Need? for more information). For a computer which is
used only for running the NDB Cluster management server, the
requirements are minimal; a common desktop PC (or the
equivalent) is generally sufficient for this task. Nodes can
communicate through the standard TCP/IP network and hardware.
They can also use the high-speed SCI protocol; however, special
networking hardware and software are required to use SCI (see
Section 22.3.4, “Using High-Speed Interconnects with NDB Cluster”).

A.10.10.

How much RAM do I need to use NDB Cluster? Is it possible to use
disk memory at all?

For in-memory NDB tables, you can use the
following formula for obtaining a rough estimate of how much RAM
is needed for each data node in the cluster:

(SizeofDatabase × NumberOfReplicas × 1.1 ) / NumberOfDataNodes

To calculate the memory requirements more exactly requires
determining, for each table in the cluster database, the storage
space required per row (see
Section 11.7, “Data Type Storage Requirements”, for details), and
multiplying this by the number of rows. You must also remember
to account for any column indexes as follows:

Each primary key or hash index created for an
NDBCLUSTER table requires
21−25 bytes per record. These indexes use
IndexMemory.

Creating a primary key or unique index also creates an
ordered index, unless this index is created with
USING HASH. In other words:

A primary key or unique index on a Cluster table
normally takes up 31 to 35 bytes per record.

However, if the primary key or unique index is created
with USING HASH, then it requires
only 21 to 25 bytes per record.

Creating NDB Cluster tables with USING HASH
for all primary keys and unique indexes will generally cause
table updates to run more quickly—in some cases by a much
as 20 to 30 percent faster than updates on tables where
USING HASH was not used in creating primary
and unique keys. This is due to the fact that less memory is
required (because no ordered indexes are created), and that less
CPU must be utilized (because fewer indexes must be read and
possibly updated). However, it also means that queries that
could otherwise use range scans must be satisfied by other
means, which can result in slower selects.

It is especially important to keep in mind that every
NDB Cluster table must have a primary key. The
NDB storage engine creates a
primary key automatically if none is defined; this primary key
is created without USING HASH.

What file systems can I use with NDB Cluster? What about network
file systems or network shares?

Generally, any file system that is native to the host operating
system should work well with NDB Cluster. If you find that a
given file system works particularly well (or not so especially
well) with NDB Cluster, we invite you to discuss your findings
in the NDB Cluster
Forums.

For Windows, we recommend that you use NTFS
file systems for NDB Cluster, just as we do for standard MySQL.
We do not test NDB Cluster with FAT or
VFAT file systems. Because of this, we do not
recommend their use with MySQL or NDB Cluster.

NDB Cluster is implemented as a shared-nothing solution; the
idea behind this is that the failure of a single piece of
hardware should not cause the failure of multiple cluster nodes,
or possibly even the failure of the cluster as a whole. For this
reason, the use of network shares or network file systems is not
supported for NDB Cluster. This also applies to shared storage
devices such as SANs.

A.10.12.

Can I run NDB Cluster nodes inside virtual machines (such as
those created by VMWare, VirtualBox, Parallels, or Xen)?

NDB Cluster is supported for use in virtual machines. We
currently support and test using
Oracle
VM.

Some NDB Cluster users have successfully deployed NDB Cluster
using other virtualization products; in such cases, Oracle can
provide NDB Cluster support, but issues specific to the virtual
environment must be referred to that product's vendor.

A.10.13.

I am trying to populate an NDB Cluster database. The loading
process terminates prematurely and I get an error message like
this one:

ERROR 1114: The table 'my_cluster_table' is
full

Why is this happening?

The cause is very likely to be that your setup does not provide
sufficient RAM for all table data and all indexes,
including the primary key required by the
NDB storage engine and
automatically created in the event that the table definition
does not include the definition of a primary key.

It is also worth noting that all data nodes should have the same
amount of RAM, since no data node in a cluster can use more
memory than the least amount available to any individual data
node. For example, if there are four computers hosting Cluster
data nodes, and three of these have 3GB of RAM available to
store Cluster data while the remaining data node has only 1GB
RAM, then each data node can devote at most 1GB to NDB Cluster
data and indexes.

In some cases it is possible to get Table is
full errors in MySQL client applications even when
ndb_mgm -e "ALL REPORT MEMORYUSAGE" shows
significant free
DataMemory. You can
force NDB to create extra
partitions for NDB Cluster tables and thus have more memory
available for hash indexes by using the
MAX_ROWS option for
CREATE TABLE. In general, setting
MAX_ROWS to twice the number of rows that you
expect to store in the table should be sufficient.

For similar reasons, you can also sometimes encounter problems
with data node restarts on nodes that are heavily loaded with
data. The MinFreePct
parameter can help with this issue by reserving a portion (5% by
default) of DataMemory
and (prior to NDB 7.6)
IndexMemory for use in
restarts. This reserved memory is not available for storing
NDB tables or data.

A.10.14.

NDB Cluster uses TCP/IP. Does this mean that I can run it over
the Internet, with one or more nodes in remote locations?

It is very unlikely that a cluster would
perform reliably under such conditions, as NDB Cluster was
designed and implemented with the assumption that it would be
run under conditions guaranteeing dedicated high-speed
connectivity such as that found in a LAN setting using 100 Mbps
or gigabit Ethernet—preferably the latter. We neither test
nor warrant its performance using anything slower than this.

Also, it is extremely important to keep in mind that
communications between the nodes in an NDB Cluster are not
secure; they are neither encrypted nor safeguarded by any other
protective mechanism. The most secure configuration for a
cluster is in a private network behind a firewall, with no
direct access to any Cluster data or management nodes from
outside. (For SQL nodes, you should take the same precautions as
you would with any other instance of the MySQL server.) For more
information, see Section 22.5.17, “NDB Cluster Security Issues”.

A.10.15.

Do I have to learn a new programming or query language to use
NDB Cluster?

No. Although some specialized commands are
used to manage and configure the cluster itself, only standard
(My)SQL statements are required for the following operations:

NDB Cluster supports the same programming APIs and languages as
the standard MySQL Server, including ODBC, .Net, the MySQL C
API, and numerous drivers for popular scripting languages such
as PHP, Perl, and Python. NDB Cluster applications written using
these APIs behave similarly to other MySQL applications; they
transmit SQL statements to a MySQL Server (in the case of NDB
Cluster, an SQL node), and receive responses containing rows of
data. For more information about these APIs, see
Chapter 28, Connectors and APIs.

NDB Cluster also supports application programming using the NDB
API, which provides a low-level C++ interface to NDB Cluster
data without needing to go through a MySQL Server. See
The NDB API. In addition, many
NDBCLUSTER management functions are
exposed by the C-language MGM API; see
The MGM API, for more information.

NDB Cluster also supports Java application programming using
ClusterJ, which supports a domain object model of data using
sessions and transactions. See
Java and NDB Cluster, for more information.

In addition, NDB Cluster provides support for
memcached, allowing developers to access data
stored in NDB Cluster using the memcached
interface; for more information, see
ndbmemcache—Memcache API for NDB Cluster.

NDB Cluster also includes adapters supporting NoSQL applications
written against Node.js, with NDB Cluster as
the data store. See MySQL NoSQL Connector for JavaScript, for more
information.

NDB Cluster 7.6 and earlier are also supported by MySQL Cluster Manager, a
separate product providing an advanced command line interface
that can automate many NDB Cluster management tasks such as
rolling restarts and configuration changes. Beginning with
version 1.4.8, MySQL Cluster Manager also provides experimental support for NDB
Cluster 8.0. For more information about MySQL Cluster Manager, see
MySQL™ Cluster Manager 1.4.8 User Manual.

NDB Cluster also provides a graphical, browser-based
Auto-Installer for setting up and deploying NDB Cluster, as part
of the NDB Cluster software distribution. For more information,
see The NDB Cluster Auto-Installer (NDB 7.5).

A.10.18.

How do I find out what an error or warning message means when
using NDB Cluster?

There are two ways in which this can be done:

From within the mysql client, use
SHOW ERRORS or SHOW
WARNINGS immediately upon being notified of the
error or warning condition.

Yes. For tables created with the
NDB storage engine, transactions
are supported. Currently, NDB Cluster supports only the
READ COMMITTED transaction
isolation level.

A.10.20.

What storage engines are supported by NDB Cluster?

NDB Cluster requires the NDB
storage engine. That is, in order for a table to be shared
between nodes in an NDB Cluster, the table must be created using
ENGINE=NDB (or the equivalent option
ENGINE=NDBCLUSTER).

It is possible to create tables using other storage engines
(such as InnoDB or
MyISAM) on a MySQL server being
used with NDB Cluster, but since these tables do not use
NDB, they do not participate in
clustering; each such table is strictly local to the individual
MySQL server instance on which it is created.

In the event of a catastrophic failure— for example, the
whole city loses power and my UPS
fails—would I lose all my data?

All committed transactions are logged. Therefore, although it is
possible that some data could be lost in the event of a
catastrophe, this should be quite limited. Data loss can be
further reduced by minimizing the number of operations per
transaction. (It is not a good idea to perform large numbers of
operations per transaction in any case.)

It is possible but not always advisable. One of the chief
reasons to run a cluster is to provide redundancy. To obtain the
full benefits of this redundancy, each node should reside on a
separate machine. If you place multiple nodes on a single
machine and that machine fails, you lose all of those nodes. For
this reason, if you do run multiple data nodes on a single
machine, it is extremely important that
they be set up in such a way that the failure of this machine
does not cause the loss of all the data nodes in a given node
group.

Given that NDB Cluster can be run on commodity hardware loaded
with a low-cost (or even no-cost) operating system, the expense
of an extra machine or two is well worth it to safeguard
mission-critical data. It also worth noting that the
requirements for a cluster host running a management node are
minimal. This task can be accomplished with a 300 MHz Pentium or
equivalent CPU and sufficient RAM for the operating system, plus
a small amount of overhead for the ndb_mgmd
and ndb_mgm processes.

It is also possible in some cases to run data nodes and SQL
nodes concurrently on the same machine; how well such an
arrangement performs is dependent on a number of factors such as
number of cores and CPUs as well as the amount of disk and
memory available to the data node and SQL node processes, and
you must take these factors into account when planning such a
configuration.

Support for partial transactions and partial rollbacks is
comparable to that of other transactional storage engines
such as InnoDB that can roll
back individual statements.

The maximum number of attributes allowed per table is 512.
Attribute names cannot be any longer than 31 characters. For
each table, the maximum combined length of the table and
database names is 122 characters.

You can import databases into NDB Cluster much as you would with
any other version of MySQL. Other than the limitations mentioned
elsewhere in this FAQ, the only other special requirement is
that any tables to be included in the cluster must use the
NDB storage engine. This means that
the tables must be created with ENGINE=NDB or
ENGINE=NDBCLUSTER.

Cluster nodes can communicate through any of three different
transport mechanisms: TCP/IP, SHM (shared memory), and SCI
(Scalable Coherent Interface). Where available, SHM is used by
default between nodes residing on the same cluster host;
however, this is considered experimental. SCI is a high-speed (1
gigabit per second and higher), high-availability protocol used
in building scalable multi-processor systems; it requires
special hardware and drivers. See
Section 22.3.4, “Using High-Speed Interconnects with NDB Cluster”, for more about
using SCI as a transport mechanism for NDB Cluster.

A.10.29.

What is an arbitrator?

If one or more data nodes in a cluster fail, it is possible that
not all cluster data nodes will be able to “see”
one another. In fact, it is possible that two sets of data nodes
might become isolated from one another in a network
partitioning, also known as a “split-brain”
scenario. This type of situation is undesirable because each set
of data nodes tries to behave as though it is the entire
cluster. An arbitrator is required to decide between the
competing sets of data nodes.

When all data nodes in at least one node group are alive,
network partitioning is not an issue, because no single subset
of the cluster can form a functional cluster on its own. The
real problem arises when no single node group has all its nodes
alive, in which case network partitioning (the
“split-brain” scenario) becomes possible. Then an
arbitrator is required. All cluster nodes recognize the same
node as the arbitrator, which is normally the management server;
however, it is possible to configure any of the MySQL Servers in
the cluster to act as the arbitrator instead. The arbitrator
accepts the first set of cluster nodes to contact it, and tells
the remaining set to shut down. Arbitrator selection is
controlled by the ArbitrationRank
configuration parameter for MySQL Server and management server
nodes. You can also use the ArbitrationRank
configuration parameter to control the arbitrator selection
process. For more information about these parameters, see
Section 22.3.3.5, “Defining an NDB Cluster Management Server”.

The role of arbitrator does not in and of itself impose any
heavy demands upon the host so designated, and thus the
arbitrator host does not need to be particularly fast or to have
extra memory especially for this purpose.

A.10.30.

What data types are supported by NDB Cluster?

NDB Cluster supports all of the usual MySQL data types,
including those associated with MySQL's spatial extensions;
however, the NDB storage engine
does not support spatial indexes. (Spatial indexes are supported
only by MyISAM; see
Section 11.4, “Spatial Data Types”, for more information.) In
addition, there are some differences with regard to indexes when
used with NDB tables.

Note

NDB Cluster Disk Data tables (that is, tables created with
TABLESPACE ... STORAGE DISK ENGINE=NDB or
TABLESPACE ... STORAGE DISK
ENGINE=NDBCLUSTER) have only fixed-width rows. This
means that (for example) each Disk Data table record
containing a
VARCHAR(255)
column requires space for 255 characters (as required for the
character set and collation being used for the table),
regardless of the actual number of characters stored therein.

Each of these commands must be run from a system shell on the
machine housing the affected node. (You do not have to be
physically present at the machine—a remote login shell can
be used for this purpose.) You can verify that the cluster is
running by starting the NDB
management client ndb_mgm on the machine
housing the management node and issuing the
SHOW or ALL STATUS
command.

To shut down a running cluster, issue the command
SHUTDOWN in the management client.
Alternatively, you may enter the following command in a system
shell:

shell> ndb_mgm -e "SHUTDOWN"

(The quotation marks in this example are optional, since there
are no spaces in the command string following the
-e option; in addition, the
SHUTDOWN command, like other management
client commands, is not case-sensitive.)

The data that was held in memory by the cluster's data
nodes is written to disk, and is reloaded into memory the next
time that the cluster is started.

A.10.33.

Is it a good idea to have more than one management node for an
NDB Cluster?

It can be helpful as a fail-safe. Only one management node
controls the cluster at any given time, but it is possible to
configure one management node as primary, and one or more
additional management nodes to take over in the event that the
primary management node fails.

Yes, it is possible to do this. In the case of multiple data
nodes, it is advisable (but not required) for each node to use a
different data directory. If you want to run multiple SQL nodes
on one machine, each instance of mysqld must
use a different TCP/IP port.

Running data nodes and SQL nodes together on the same host is
possible, but you should be aware that the
ndbd or ndbmtd processes
may compete for memory with mysqld.

A.10.36.

Can I use host names with NDB Cluster?

Yes, it is possible to use DNS and DHCP for cluster hosts.
However, if your application requires “five nines”
availability, you should use fixed (numeric) IP addresses, since
making communication between Cluster hosts dependent on services
such as DNS and DHCP introduces additional potential points of
failure.

A.10.37.

Does NDB Cluster support IPv6?

IPv6 is supported for connections between SQL nodes (MySQL
servers), but connections between all other types of NDB Cluster
nodes must use IPv4.

How do I handle MySQL users in an NDB Cluster having multiple
MySQL servers?

MySQL user accounts and privileges are normally not
automatically propagated between different MySQL servers
accessing the same NDB Cluster. MySQL NDB Cluster provides
support for shared and synchronized users and privileges using
the NDB_STORED_USER privilege;
see Section 22.5.12, “Distributed MySQL Privileges with NDB_STORED_USER”, for
more information. You should be aware that this implementation
is new to NDB 8.0 and is not compatible with the shared
privileges mechanism employed in earlier versions of NDB
Cluster, which is no longer supported in NDB 8.0.

A.10.39.

How do I continue to send queries in the event that one of the
SQL nodes fails?

MySQL NDB Cluster does not provide any sort of automatic
failover between SQL nodes. Your application must be prepared to
handle the loss of SQL nodes and to fail over between them.

This process monitors and, if necessary, attempts to restart the
data node process. If you check the list of active processes on
your system after starting ndbd, you can see
that there are actually 2 processes running by that name, as
shown here (we omit the output from ndb_mgmd
and ndbd for brevity):

The ndbd process showing
0.0 for both memory and CPU usage is the
angel process (although it actually does use a very small amount
of each). This process merely checks to see if the main
ndbd or ndbmtd process
(the primary data node process which actually handles the data)
is running. If permitted to do so (for example, if the
StopOnError
configuration parameter is set to false), the
angel process tries to restart the primary data node process.