MySQL

Replication is an important concept for improving database performance and security. In this blog post, I would like to demonstrate how the consistency between a MySQL master and a slave can be verified. We will create two Docker containers, one for the master one for the slave.

Installing the Percona Toolkit

The Percona Toolkit is a collection of useful utilities, which can be obained for free from the company’s portal. The following commands install the prerequisits, download the package and eventually the package.

Now log into the slave container and add the connection details for the master:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

## Connect to the MySQL Slave instance

$mysql-uroot-h172.17.0.3

### Setup the slave

mysql>CHANGE MASTER TO

MASTER_HOST='172.17.0.2',

MASTER_PORT=3306,

MASTER_USER='replication',

MASTER_PASSWORD='SLAVE-SECRET',

MASTER_LOG_FILE='mysqld-bin.000002',

MASTER_LOG_POS=346;

Query OK,0rows affected,2warnings(0,05sec)

### Start and check

mysql>start slave;

Query OK,0rows affected(0,01sec)

mysql>show slave status\G

***************************1.row ***************************

Slave_IO_State:Waiting formaster tosend event

Master_Host:172.17.0.2

Master_User:percona

Master_Port:3306

Connect_Retry:60

Master_Log_File:mysqld-bin.000002

Read_Master_Log_Pos:346

Relay_Log_File:mysqld-relay-bin.000002

Relay_Log_Pos:284

Relay_Master_Log_File:mysqld-bin.000002

Slave_IO_Running:Yes

Slave_SQL_Running:Yes

Now our simple slave setup is running.

Get some test data

Lets download the Sakila test database and import it into the master. It will be replicated immediately.

1

2

3

4

5

wget http://downloads.mysql.com/docs/sakila-db.tar.gz

~/Docker-Projects$tar xvfz sakila-db.tar.gz

mysql-uroot-h172.17.0.2<sakila-db/sakila-schema.sql

mysql-uroot-h172.17.0.2<sakila-db/sakila-data.sql

Verify that the data is on the slave as well:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

mysql-uroot-h172.17.0.3-e"USE sakila;SHOW TABLES;"

+----------------------------+

|Tables_in_sakila|

+----------------------------+

|actor|

|actor_info|

|address|

|category|

|city|

|country|

|customer|

...

|store|

+----------------------------+

After our setup is completed, we can proceed with Percona pt-table checksum.

Percona pt-table-checksum

The Percona pt-table-checksum tool requires the connection information of the master and the slave in a specific format. This is called the DSN (data source name), which is a coma separated string. We can store this information in a dedicated database called percona in a table called dsns. We create this table on the master. Note that the data gets replicated to the slave within the blink of an eye.

1

2

3

4

5

6

7

8

CREATE DATABASE percona;

USEpercona;

CREATE TABLE`DSN-Table`(

`id`int(11)NOTNULLAUTO_INCREMENT,

`dsn`varchar(255)NOTNULL,

PRIMARY KEY(`id`)

);

The next step involves creating permissions on the slave and the master!

The percona user is needed to run the script. Note that the IP address is this time from the (Docker) host, having the IP 172.17.0.1 by default. In real world scenarios, this script would either be run on the master or on the slave directly.

Now we need to add the information about the slave to the table we created. The Percona tool could also read this from the process list, but it is more reliable if we add the information ourselves. To do so, we add a record to the table we just created, which describes the slave DSN:

The result shows a difference in the MySQL internal table for permissions. This is obviously not what we are interested in, as permissions are individual to a host. So we rather exclude the MySQL internal database and also the percona database, because it is not what we are interested in. Also in order to test it the tool works, we delete the last five category assignments from the table with
mysql-uroot-h172.17.0.3-e"DELETE FROM sakila.film_category WHERE film_id > 995;" and update a row in the city table with

1

mysql-uroot-h172.17.0.3-e"update sakila.city SET city='Innsbruck' WHERE city_id=590;"

You see that there is a difference in the tables sakila.city and in the table sakila.film_category. The tool does not report the actual number of differences, but rather the number of different chunks. To get the actual differences, we need to use a different tool, which utilises the checksum table that the previous step created.

Show the differences with pt-tabel-sync

The pt-table-sync tool is the counter part for the pt-table-checksum util. It can print or even replay the SQL statements that would render the slave the same state again to be in sync with the master. We can run a dry-run first, as the tool is potentially dangerous.

The command shows how we can rename back from Innsbruck to Yuncheng again and also provides the INSERT statements to get the deleted records back.When we replace –print with –execute, the data gets written to the master and replicated to the slave. To allow this, we need to set the permissions on the master

This error indicates that updating the city table has consequences, because it is a FK to child tables. In this example, we are bold and ignore this warning. This is absolutely not recommended for real world scenarios.

Spring Boot is a framework, that makes it much easier to develop Spring based applications, by following a convention over configuration principle (while in contrast Spring critics claim that the framework’s principle is rather configuration over everything). In this article, I am going to provide an example how to achieve the following:

Create a simple Web application based on Spring Boot

Persist and access data with Hibernate

Make it searchable with Hibernate Search (Lucine)

I use Eclipse with a Gradle plugin for convenience. MySQL will be our back-end for storing the data. The full example can be obtained from my Github Repository.

Bootstrapping: Create a Simple Spring Boot Webapp

The easiest way to start with Spring Boot is heading over to start.spring.io and create a new project. In this example, I will use Gradle for building the application and handling the dependencies and I add Web and JPA starters.

Download the archive to your local drive and extract it to a folder. I called the project SearchaRoo.

Import the Project with Eclipse

Import it as an existing Gradle Project in Eclipse by using the default settings. You will end up with a nice little project structure as shown below:

We have a central application starter class denoted SearchaRooAppication.java, package definitions, application properties and even test classes. The great thing with Spring Boot is that it is very simple to start and that you can debug it as every other local Java application. There is no need for remote debugging or complex application server setups.

Modelling Reality

The next step covers modelling the data which we imported with Java POJOs. Obviously this is not the most natural way, because in general you would create the model first and then add data to it, but as we already had the data we decided to go in this direction. In the application.properties file, set the database to the imported employees database and set the Hibernate create property to validate. With this setting, we can confirm that we modelled the Java classed in accordance with the database model defined by the MySQL employees database.

An example of such a class is shown below, the other classes can be found in the Github repository.

Now that we have prepared the data model, our schema is now fixed and does not change any more. We can deactivate the Hibernate based dynamic generation of the database tables and use the Spring database initialization instead.To see if we modelled the data correctly, we import MySQL employee data dump we obtained before and import it into our newly created schema, which maps the Java POJOs.

Importing the Initial Data

In the next step, we import the data from the MySQL employee database into our schema spring_hibernate. This schema contains the tables that Hibernate created for us. The following script copies the data between the two schemata. If you see an error, then there is an issue with your model.

MySQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

-- The original data is stored in the database called employees

-- Spring created the new schema called spring_employees

USE`spring_employees`;

-- Departments

INSERTINTO`spring_employees`.`departments`

(`dept_no`,

`dept_name`)

SELECT`departments`.`dept_no`,

`departments`.`dept_name`

FROM`employees`.`departments`;

-- Employees

INSERTINTO`spring_employees`.`employees`

(`emp_no`,

`birth_date`,

`first_name`,

`gender`,

`hire_date`,

`last_name`)

SELECT`employees`.`emp_no`,

`employees`.`birth_date`,

`employees`.`first_name`,

`employees`.`gender`,

`employees`.`hire_date`,

`employees`.`last_name`

FROM`employees`.`employees`;

-- Join table

INSERTINTO`spring_employees`.`dept_emp`

(`emp_no`,

`dept_no`,

`from_date`,

`to_date`)

SELECT

`dept_emp`.`emp_no`,

`dept_emp`.`dept_no`,

`dept_emp`.`from_date`,

`dept_emp`.`to_date`

FROM`employees`.`dept_emp`;

-- Join table

INSERTINTO`spring_employees`.`dept_manager`

(

`emp_no`,

`dept_no`,

`from_date`,

`to_date`)

SELECT`dept_manager`.`emp_no`,

`dept_manager`.`dept_no`,

`dept_manager`.`from_date`,

`dept_manager`.`to_date`

FROM`employees`.`dept_manager`;

-- Titles

INSERTINTO`spring_employees`.`titles`

(`emp_no`,

`title`,

`from_date`,

`to_date`)

SELECT`titles`.`emp_no`,

`titles`.`title`,

`titles`.`from_date`,

`titles`.`to_date`

FROM`employees`.`titles`;

-- Salaries

INSERTINTO`spring_employees`.`salaries`

(`emp_no`,

`salary`,

`from_date`,

`to_date`)

SELECT`salaries`.`emp_no`,

`salaries`.`salary`,

`salaries`.`from_date`,

`salaries`.`to_date`

FROM`employees`.`salaries`;

We now imported the data in the database schema that we defined for our project. Spring can load schema and initial data during start-up. So we provide two files, one containing the schema and the other one containing the data. To do that, we create two dumps of the database. One containing the schema only, the other one containing the data only.

By deactivating the Hibernate data generation and activating the Spring way, the database gets initialized every time the application starts. Change and edit the following lines in the application.properties

1

2

3

4

spring.jpa.hibernate.ddl-auto=none

spring.datasource.initialize=true

spring.datasource.schema=classpath:/schema.sql

spring.datasource.data=classpath:/data.sql

Before we can import the data with the scripts, make sure to drop the schema and disable foreign key checks in the schema file and enable them again at the end. Spring ignores the actionable MySQL comments. So your schema file should contain this

1

2

3

4

5

6

DROP DATABASE IFEXISTS`spring_employees`;

SET foreign_key_checks=0;

// rest of the code

SET foreign_key_checks=1;

And also insert the two foreign key statements to the data file. Note that the import can take a while. If you are happy with the initialized data, you can deactivate the initialization by setting the variable to false:
spring.datasource.initialize=false

Adding Hibernate Search

Hibernate search offers full-text search capabilities by using a dedicated index. We need to add the dependencies to the build file.

1

2

3

4

5

6

7

8

9

10

11

dependencies{

compile('org.springframework.boot:spring-boot-starter-data-jpa')

compile('org.springframework.boot:spring-boot-starter-web')

testCompile('org.springframework.boot:spring-boot-starter-test')

compile("mysql:mysql-connector-java")

compile('org.apache.commons:commons-lang3:3.5')

compile("org.hibernate:hibernate-search-orm:5.5.6.Final")

compile('org.springframework.boot:spring-boot-starter-test')

compile('org.springframework.boot:spring-boot-starter-logging')

compile('org.springframework.boot:spring-boot-starter-freemarker')

}

Refresh the gradle file after including the search dependencies.

Adding Hibernate Search Dependencies

In this step, we annotate the model POJO classes and introduce the full-text search index. Hibernate search utilises just a few basic settings to get started. Add the following variables to tne application properties file.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

# ===============================

# = HIBERNATE SEARCH

# ===============================

# Spring Data JPA will take any properties under spring.jpa.properties.* and

# pass them along (with the prefix stripped) once the EntityManagerFactory is

Please not that storing the Lucene index in the tmp directory is not the best idea, but for testing we can use this rather futile location. We also use the filesystem to store the index, as this is the simplest approach.

Create a Service

In order to facilitate Hibernate Search on our data, we add a service class, which offers methods for searching. The service uses a configuration, which is injected by Spring during run time. The configuration is very simple.

The @Configuration is loaded when Spring builds the application context. It provides a bean of our service, which can then be injected into the application. The service itself provides methods for creating and searching the index. In this example, the search method is very simple: it only searches on the first and the last name of an employee and it allows users to make one mistake (distance 1).

The service implementation currently only contains an initialization method, which used for creating the Lucene index on the filesystem. Before we can test the index, we need to have at least one indexed entity. This can be achieved by simply adding the annotation @Indexed to the POJO.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

packageat.stefanproell.model;

import java.util.Date;

import java.util.List;

import javax.persistence.Column;

import javax.persistence.Entity;

import javax.persistence.EnumType;

import javax.persistence.Enumerated;

import javax.persistence.Id;

import javax.persistence.OneToMany;

import org.hibernate.search.annotations.Indexed;

@Entity(name="employees")

@Indexed

publicclassEmployee{

...

}

When we start the application now, we can see that Hibernate creates the index and a short check on disk shows that it worked:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

$tree/tmp/SerchaRoo/

/tmp/SerchaRoo/

└──at.stefanproell.model.Employee

├──_6.fdt

├──_6.fdx

├──_6.fnm

├──_6_Lucene50_0.doc

├──_6_Lucene50_0.pos

├──_6_Lucene50_0.tim

├──_6_Lucene50_0.tip

├──_6.si

├──segments_2

└──write.lock

1directory,10files

So far, we did not tell Hibernate search which fields we want add to the index and thus make them full-text searchable. The following listing shows the annotated @Fields.

Starting the application again re-creates the index. Time for some basic searching.

Seaching Fulltext

Hibernate Search offers many features, which are not offered in a similar quality by native databases. One interesting feature is for instance fuzzy search, which allows finding terms within an edit distance of up to two letters. The method for searchin on two fields was already shown above. We can use this method in a small JUnit test:

The user made a small typo by entering Chrisu instead of Chris. As we allowed two mistakes, we receive a list of similar names and the test evaluates to passed. Sone possible results are shown below.

1

2

3

4

5

6

7

8

9

10

Id:456935Hiroyasu Christ

Id:467982Alselm Christ

Id:482230Dinah Christ

Id:484411Akeel Christ

Id:498582Mihalis Christ

Id:14298Chriss Granlund

Id:14418Chriss Kadhim

Id:18643Christ Bernini

Id:19919Chrisa Gornas

Id:20537Chrisa Rando

Conclusions

Hibernate Search is a great tool and can be easily integrsted into Spring Boot Applicstions. In this post, I gave a minimalistic example how fulltext fuzzy search can be added to existing databases and allows a flexible and powerful search. A few more advanced thoughts on Hibernate Search are given in this blog post here. The Hibernate Search documentation contains a lot of useful and more elaborate examples. The full example can be obtained on Github.

Installing MySQL via apt-get is a simple task, but the migration between different MySQL versions requires planning and testing. Thus installing one central instance of the database system might not be suitable, when the version of MySQL or project specific settings should be switched quickly without interfering with other applications. Using one central instance can quickly become cumbersome. In this article, I will describe how any number of MySQL instances can be stored and executed from within a user’s home directory.

Adapting MySQL Data an Log File Locations

Some scenarios might require to run several MySQL instances at once, other scenarios cover sensitive data, where we do not want MySQL to write any data on non-encrypted partitions. This is especially true for devices which can get easily stolen, for instance laptops. If you use a laptop for developing your applications from time to time, chances are good that you need to store sensitive data in a database, but need to make sure that the data is encrypted when at rest. The data stored in a database needs to be protected when at rest.

This can be solved with full disk encryption, but this technique has several disadvantages. First of all, full disk encryption only utilises one password. This entails, that several users who utilise a device need to share one password, which reduces the reliability of this approach. Also when the system needs to be rebooted, full disk encryption can become an obstacle, which increases the complexity further.

Way easier to use is the transparent home directory encryption, which can be selected during many modern Linux setup procedures out of the box. We will use this encryption type for this article, as it is reasonable secure and easy to setup. Our goal is to store all MySQL related data in the home directory and run MySQL with normal user privileges.

Creating the Directory Structure

The first step is creating a directory structure for storing the data. In this example, the user name is stefan, please adapt to your needs.

1

2

3

4

5

6

$cd~

$mkdir-pMySQL-5.6-Local/MySQL-5.6-Data# Data storage

$mkdir-pMySQL-5.6-Local/MySQL-5.6-Conf# Config file

$mkdir-pMySQL-5.6-Local/MySQL-5.6-Log# Log file

$mkdir-pMySQL-5.6-Local/MySQL-5.6-Tmp# Tmp

$mkdir-pMySQL-5.6-Local/MySQL-5.6-Scripts# Start and stop scripts

Create a Configuration File

Make sure to use absolute paths and utilise the directories we created before. Store this file in
MySQL-5.6-Local/MySQL-5.6-Conf/my-5.6.cnf. The configuration is pretty self explanatory.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

[client]

port=3306

socket=/home/stefan/MySQL-5.6-Local/mysqld-5.6.sock

[mysqld_safe]

pid-file=/home/stefan/MySQL-5.6-Local/mysqld-5.6.pid

nice=0

log_error=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Log/mysql_error.log

[mysqld]

user=stefan

pid-file=/home/stefan/MySQL-5.6-Local/mysqld-5.6.pid

socket=/home/stefan/MySQL-5.6-Local/mysqld-5.6.sock

port=3306

basedir=/usr

datadir=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Data

tmpdir=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Tmp

log_error=/home/stefan/MySQL-5.6-Local/MySQL-5.6-Log/mysql_error.log

bind-address=127.0.0.1

Stop the Running MySQL Instance

If you already have a running MySQL instance, make sure to shut it down. You can also disable MySQL from starting automatically.

1

2

sudo service msql stop

echo"manual"|sudo tee/etc/init/mysql.override

Setting up Apparmor

Apparmor protects sensitive applications by defining in which directory they might write. We need to update this configuration to suit our needs. We need to make sure that the global configuration file for the central MySQL instance also includes an additional local information. Edit this file first:
/etc/apparmor.d/usr.sbin.mysqld and make sure that the reference for the local file is not commented out.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

# vim:syntax=apparmor

# Last Modified: Fri Feb 28 18:06:30 2014

#include <tunables/global>

/usr/sbin/mysqld{

#include <abstractions/base>

#include <abstractions/nameservice>

#include <abstractions/user-tmp>

# .... shortened

# Include local configuration!!

include<local/usr.sbin.mysqld>

}

Now we need to add the directories in stean’s home directory to the local file by editing
/etc/apparmor.d/local/usr.sbin.mysqld .

1

2

3

4

5

6

7

# Site-specific additions and overrides for usr.sbin.mysqld.

# For more details, please see /etc/apparmor.d/local/README.

# Site-specific additions and overrides for usr.sbin.mysqld.

# For more details, please see /etc/apparmor.d/local/README.

/home/stefan/MySQL-5.6-Local/r,

/home/stefan/MySQL-5.6-Local/**rwk,

An incorrect Apparmor configuration is often the cause of permission errors, which can be a pain. Make sure to reload the the Apparmor service and if you struggle with it, consider to disable it temporarily and check if the rest works. Do not forget to turn it on again.

1

2

sudo service apparmor reload# reload

sudo service apparmor teardown# disable for testing

Initialize the Local MySQL Instance

Now it is time to initialize the MySQL instance. In this step, MySQL creates all the files it needs in the data directory. It is important that the data directory is empty, when you initiate the following commands.

Conclusion

The technique described above allows to install and run multiple MySQL instances from within the user’s home directory. The MySQL instances run with user privileges and can utilise dedicated data and log file directories. As the all data is stored within the $HOME directory, we can easily apply transparent encryption to protect data at rest.

In a recent article, I described how to setup a basic MySQL Cluster with two data nodes and a combined SQL and management node. In this article, I am going to highlight a hew more things and we are going to adapt the cluster a little bit.

Using Hostnames

For making our lives easier, we can use hostnames which are easier to remember than IP addresses. Hostnames can be specified for each VM in the file
/etc/hosts . For each request to the hostname, the operating system will lookup the corresponding IP address. We need to change this file on all three nodes to the following example:

1

2

3

4

5

6

7

8

9

10

11

root@node1:~# cat /etc/hosts

127.0.0.1localhost

192.168.0.81node1

192.168.0.82node2

192.168.0.83node3

# The following lines are desirable for IPv6 capable hosts

::1localhost ip6-localhost ip6-loopback

ff02::1ip6-allnodes

ff02::2ip6-allrouters

Now we can lookup IP addresses by the associated hostname.

Adapting Node1

The management and the SQL node are both running in the same VM. Now that we introduced the hostnames for all machines, we need to reconfigure the cluster manager and the MySQL Server. The cluster configuration is stored in the file
/opt/mysql/server-5.7/mysql-cluster/config.ini . Replace all the IP addresses with the corresponding host name.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

root@node1:~# cat /opt/mysql/server-5.7/mysql-cluster/config.ini

[ndb_mgmd]

NodeId=1

# Management process options:

hostname=node1# Hostname of the manager

datadir=/opt/mysql/server-5.7/mysql-cluster/cluster-data# Directory for the log files

[ndbd]

NodeId=2

hostname=node2# Hostname of the first data node

datadir=/usr/local/mysql/data# Remote directory for the data files

[ndbd]

NodeId=3

hostname=node3

datadir=/usr/local/mysql/data# Remote directory for the data files

[mysqld]

# SQL node options:

NodeId=4

hostname=node1

For reconfiguring the SQL node, we need to adapt the file
/etc/my.cnf . We need to replace all IP addresses in this file too.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

root@node1:~# cat /etc/my.cnf

[mysqld_safe]

socket=/tmp/mysql.sock

nice=0

[mysqld]

#

# * Basic Settings

#

user=mysql

pid-file=/tmp/mysql.pid

socket=/tmp/mysql.sock

port=3306

basedir=/opt/mysql/server-5.7

datadir=/opt/mysql/server-5.7/data

tmpdir=/tmp

skip-external-locking

bind-address=node1

key_buffer_size=16M

max_allowed_packet=16M

thread_stack=192K

thread_cache_size=8

query_cache_limit=1M

query_cache_size=16M

log_error=/var/log/mysql/error.log

expire_logs_days=10

max_binlog_size=100M

#

# ssl-ca=/etc/mysql/cacert.pem

# ssl-cert=/etc/mysql/server-cert.pem

# ssl-key=/etc/mysql/server-key.pem

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

ndbcluster# run NDB storage engine

ndb-connectstring=node1

[mysql_cluster]

# Options for MySQL Cluster processes:

ndb-connectstring=node1

If done, stop the cluster and the SQL node like this:

1

2

3

4

5

6

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm -e shutdown

Connected toManagement Server at:node1:1186

3NDB Cluster node(s)have shutdown.

Disconnecting toallow management server toshutdown.

root@node1:~# service mysql stop

Adapt the Data Nodes

Replacing the IP adresses with hostnames is pretty straightforward. Change the IP address on each node with the host name in the file
/etc/my.cnf :

1

2

3

4

5

6

[mysqld]

# Options for mysqld process:

ndbcluster

[mysql_cluster]

ndb-connectstring=node1

Startup the Cluster

Now that we have exchanged all IP addresses for the hostname, we can restart the cluster as follows:

Encrypt Data in Transit

For improving the security of the cluster, we can encrypt the traffic which is exchanged between the MySQL node and the MySQL client. Please note that this encryption method is not specific for the cluster, but rather encrypts the traffic between the MySQL client and the MySQL server. You can use this method also for standard MySQL server setups. It does not cover encryption between the data nodes. By design, data nodes are intended to be run in a private network, not via the open Interwebs.

Create a CA Authority Key and Certificate

To do so, we work on node 1. In the first step we create a CA authority and a CA certificate.

What about the Data Nodes?

So far we encrypted only the traffic between the MySQL client and the MySQL server. This configuration does not differ from a single server or replication setup and does not include the traffic between the data nodes, which is not encrypted at all. MySQL cluster has been designed to be run in a controlled environment within high speed network locally. The MySQL Cluster FAQ states:

It is very unlikely that a cluster would perform reliably under such conditions, as NDB Cluster was designed and implemented with the assumption that it would be run under conditions guaranteeing dedicated high-speed connectivity such as that found in a LAN setting using 100 Mbps or gigabit Ethernet—preferably the latter. We neither test nor warrant its performance using anything slower than this.

Also, it is extremely important to keep in mind that communications between the nodes in an NDB Cluster are not secure; they are neither encrypted nor safeguarded by any other protective mechanism. The most secure configuration for a cluster is in a private network behind a firewall, with no direct access to any Cluster data or management nodes from outside.

A Cluster Scenario

In this example we create the smallest possible MySQL cluster based on four nodes running on three machines. Node 1 will run the cluster management software, Node 2 and Node 3 will serve as dats nodes and Node 4 is the MySQSL API, which runs on the same VM on Node 1.

1

2

3

4

5

6

7

8

9

+------------------++------------------++-------------------+

|Node1&4||Node2||Node3|

|IP:192.168.0.81||IP:192.168.0.82||IP:192.168.0.83|

|ndb_mgm;MySQL||ndbd||ndbd|

+------------------++------------------++-------------------+

+------------------------------------------------------------------+

|VirtualBox Host|

+------------------------------------------------------------------+

What is a Cluster and When to Use it?

When applications grow, there are several possibilities to improve the performance of the database layer. The two technologies used in most scenarios are replication and clustering.

Replication is used for moving the data from a master database to one or more slave databases. Per default, this data exchange is asynchronously. Semi-synchronous data replication can be implemented, but if you need synchronous data replication on several machines, clusters are the way to go.

A MySQL Cluster implements a shared nothing architecture and supports synchronous replication. You can very easily add and remove nodes to your cluster and scale out horizontally. This is especially true if you need scaling the database writes. Clusers use a special storage engine called NDB. This technology offers high-availability and high-redundancy. The documentation for a MySQL 5.7 cluster can be found here.

Preconditions

In this example, we will use VirtualBox as virtualization platform and run three nodes as the smallest possible cluster. In the first step, we create a template virtual machine, which comes with the basic packages, passwordless keybased authentication and a manually configured IP address. We will clone this machine and adapt it to our needs. Parts of this tutorial are based on the digital ocean howto.

The Template

We create the first VM based on Ubuntu 16.05 LTS and install the basics. MySQL requires the libaio1 package, which we will install in the template like this:

1

sudo apt-get install libaio1

Then it is time to download the MySQL Cluster software, which can be obtained for free from the MySQL developer page. This are the two steps necessary:

Make sure to install the guest additions, add a MySQL user, copy the MySQL startup script etc. Now the template machine is ready. Shut it down and clone it two times. Make sure to assign a new MAC address and then assign a different, static IP address.

Setting Up the Machines

In the following steps, we are going to setup the virtul machines. We run three machines which provide four services.

Install the Manager on Node 1

The cluster will be controlled via Node 1. First, create a data directory, where we will store the configuration files.

1

2

mkdir/opt/mysql/server-5.7/mysql-cluster/cluster-data

chown-Rmysql:mysql/opt/mysql/server-5.7/mysql-cluster/cluster-data

Then create the configuration file. This file defines the cluster and will be read during the startup of the cluster. Each node needs to have a unique id. Usualy, the ID starts with 0, but in this example we defined the node IDs starting from 1 in order to align it with the IP addresses.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

root@node1:~# cat /opt/mysql/server-5.7/mysql-cluster/config.ini

[ndb_mgmd]

NodeId=1

# Management process options:

hostname=192.168.0.81#node1 # Hostname of the manager

datadir=/opt/mysql/server-5.7/mysql-cluster/cluster-data# Directory for the log files

[ndbd]

NodeId=2

hostname=192.168.0.82#node2 # Hostname of the first data node

datadir=/usr/local/mysql/data# Remote directory for the data files

[ndbd]

NodeId=3

hostname=192.168.0.83

datadir=/usr/local/mysql/data# Remote directory for the data files

[mysqld]

# SQL node options:

NodeId=4

hostname=192.168.0.81

Note that if you make changes to the config.ini files, they might not be loaded directly, but are read from a cache file. So if you change for instance the data directory, make sure to delete all cached files, e.g. ndb_1_config.bin.1*.

Install the MySQL Server on Node1

The installation of the MySQL service is pretty straight forward and follows a similar pattern as a regular MySQL setup. You just need to ensure that you use the Cluster Version which we already installed in the template machine and create symlinks accordingly. The configuration of the MySQL Server is done in
/etc/my.cnf , as usual. In the following you can see an example configuration.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

root@node1:~# cat /etc/my.cnf

[mysqld_safe]

socket=/tmp/mysql.sock

nice=0

[mysqld]

#

# * Basic Settings

#

user=mysql

pid-file=/tmp/mysql.pid

socket=/tmp/mysql.sock

port=3306

basedir=/opt/mysql/server-5.7

datadir=/opt/mysql/server-5.7/data

tmpdir=/tmp

skip-external-locking

key_buffer_size=16M

max_allowed_packet=16M

query_cache_limit=1M

query_cache_size=16M

log_error=/var/log/mysql/error.log

server-id=1

expire_logs_days=10

max_binlog_size=100M

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

ndbcluster# run NDB storage engine

ndb-connectstring=192.168.0.81

[mysql_cluster]

# Options for MySQL Cluster processes:

ndb-connectstring=192.168.0.81# location of management server

The most important bits are at the bottom of the file. The command
ndbcluster makes the NDB storage engine available and the section
[mysql_cluster] points to the cluster management node. In our case, the MySQL server node and the cluster management node are on the same machine. In larger settings it can be benefitial to have a dedicated VM for each task.

Make sure to set the permissions for the data directory correctly and to place the startup script in
/etc/init.d/mysql . Make sure to start the MySQL server on Node 1 and check the error.log file for any strange messages.

Setting up the Data Nodes

In the next step, we need to configure the data nodes. Launch Node 2 and login.

1

2

3

4

# Create the data directory

sudo mkdir-p/var/mysql/data

# permissions

sudo chown-Rmysql:mysql/var/mysql

The basic configuration is very simple. Just edit the /etc/my.cnf file and provide the address of the management node.

1

2

3

4

5

6

7

stefan@node2:~$cat/etc/my.cnf

[mysqld]

# Options for mysqld process:

ndbcluster

[mysql_cluster]

ndb-connectstring=192.168.0.81

Then we can start the data node with the following command:

1

sudo/opt/mysql/server-5.7/bin/ndbd--initial

You should see something like this:

1

2

3

stefan@node3:~$sudo/opt/mysql/server-5.7/bin/ndbd

2016-11-2616:39:29[ndbd]INFO--Angel connected to'192.168.0.81:1186'

2016-11-2616:39:29[ndbd]INFO--Angel allocated nodeid:3

If something goes wrong, check the error log file called
ndb_X_out.log , where X is the node id. So in this example the actual file name is
ndb_2_out.log . Proceed the same way with the second data node called Node 3.

Starting the Cluster

Now comes the great moment. We start the cluster. Please note that you cannot start th cluster before you have configured at least two data nodes and a MySQL API node. All components need to be running, otherwise there will be error messages.

If there are no errors, we can login into the management console for the cluster with the following command.

1

/opt/mysql/server-5.7/bin/ndb_mgm

You can then get an overview of your cluster with the
show command. In the ideal case, the output reads as follows:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm

--NDB Cluster--Management Client--

ndb_mgm>show

Connected toManagement Server at:192.168.0.81:1186

Cluster Configuration

---------------------

[ndbd(NDB)]2node(s)

id=2@192.168.0.82(mysql-5.7.16ndb-7.5.4,Nodegroup:0,*)

id=3@192.168.0.83(mysql-5.7.16ndb-7.5.4,Nodegroup:0)

[ndb_mgmd(MGM)]1node(s)

id=1@192.168.0.81(mysql-5.7.16ndb-7.5.4)

[mysqld(API)]1node(s)

id=4@192.168.0.81(mysql-5.7.16ndb-7.5.4)

Connecting to MySQL

Now that our cluster is up and running, we can interact with MySQL, by connecting to the MySQL server running on Node 1. In order to distribute the data automatically on the nodes, we need to utilise the NDB Engine when we create our tables. This is a distributed version of the InnoDB engine and comes with different features. A comparison is given here. Connect to the MySQL instance on Node 1 as you always would.

1

2

root@node1:~# mysql -u root -p

Enter password:

Then we can create a test table to see of the system works properly. In contrast to replication, the nodes do not store the whole data each, but rather gets the data distributed over at least two nodes. This is also the reason why we need to have at least two nodes.

Sakila in the Cluster

We can use the famous Sakila database that is available from the MySQL web site to test our Cluster briefly. The following series of commands retrieves the database and imports it into our cluster.

1

2

3

4

5

6

7

8

9

10

# Fetch the data

wget http://downloads.mysql.com/docs/sakila-db.tar.gz

# Extract

tar xvfz sakila-db.tar.gz

cd sakila-db/

# Change the default engine with the cluster engine

sed-i's/engine=InnoDB/ndbcluster/g'sakila-schema.sql

# Import schema and data

mysql-uroot-p<sakila-schema.sql

mysql-uroot-psakila<sakila-data.sql

Now the data resides in our cluster and is distributed via our two nodes. We can run the following sample query and the system transparently gets the data from the two nodes.

MySQL

1

2

3

4

5

6

7

8

9

10

11

USEsakila;

SELECTCONCAT(customer.last_name,', ',customer.first_name)AScustomer,

address.phone,film.title

FROMrentalINNER JOINcustomerONrental.customer_id=customer.customer_id

INNER JOINaddressONcustomer.address_id=address.address_id

INNER JOINinventoryONrental.inventory_id=inventory.inventory_id

INNER JOINfilmONinventory.film_id=film.film_id

WHERErental.return_dateISNULL

ANDrental_date+INTERVALfilm.rental_durationDAY<CURRENT_DATE();

...

183rowsinset(0.00sec)

Caveats

If you try to start the cluster before all components are up and ready, finding the error can be a bit tricky. In the following example, we did not start the data nodes. The MySQL API node is running, but it does not show up.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

root@node1:~# /opt/mysql/server-5.7/bin/ndb_mgm

--NDB Cluster--Management Client--

ndb_mgm>show

Connected toManagement Server at:192.168.0.81:1186

Cluster Configuration

---------------------

[ndbd(NDB)]2node(s)

id=2(notconnected,accepting connect from192.168.0.82)

id=3(notconnected,accepting connect from192.168.0.83)

[ndb_mgmd(MGM)]1node(s)

id=1@192.168.0.81(mysql-5.7.16ndb-7.5.4)

[mysqld(API)]1node(s)

id=4(notconnected,accepting connect from192.168.0.81)

Check the error log of the management cluster in that case. As long as the cluster does not consist of at least two data nodes and a MySQL API node, it will not start.