Monday, January 15, 2018

How to Deploy a MongoDB Sharded Cluster on CentOS 7

Sharding is a MongoDB process to store data-set
across different machines. It allows you to perform a horizontal scale
of data and to partition all data across independent instances. Sharding
allows you to add more machines based on data growth to your stack.

Sharding and Replication

Let's make it simple. When you have collections of music, 'Sharding'
will save and keep your music collections in different folders on
different instances or replica sets while 'Replication' is just syncing
your music collections to other instances.

Three Sharding Components

Shard - Used to store all data. And in a production
environment, each shard is replica sets. Provides high-availability and
data consistency.Config Server - Used to store cluster metadata, and
contains a mapping of cluster data set and shards. This data is used by
mongos/query server to deliver operations. It's recommended to use more
than 3 instances in production.Mongos/Query Router - This is just mongo instances
running as application interfaces. The application will make requests to
the 'mongos' instance, and then 'mongos' will deliver the requests
using shard key to the shards replica sets.

Prerequisites

2 centOS 7 server as Config Replica Sets

10.0.15.31 configsvr1

10.0.15.32 configsvr2

4 CentOS 7 server as Shard Replica Sets

10.0.15.21 shardsvr1

10.0.15.22 shardsvr2

10.0.15.23 shardsvr3

10.0.15.24 shardsvr4

1 CentOS 7 server as mongos/Query Router

10.0.15.11 mongos

Root privileges

Each server connected to another server

Step 1 - Disable SELinux and Configure Hosts

In this tutorial, we will disable SELinux. Change SELinux configuration from 'enforcing' to 'disabled'.
Connect to all nodes through OpenSSH.

Now install MongoDB 3.4 from mongodb repository using the following yum command.

sudo yum -y install mongodb-org

After mongodb is installed, you can use the 'mongo' or 'mongod' command.

mongod --version

Step 3 - Create Config Server Replica Set

In the 'prerequisites' section, we've already defined config server
with 2 machines 'configsvr1' and 'configsvr2'. And in this step, we will
configure it to be a replica set.
If there is a mongod service running on the server, stop it using the systemctl command.

systemctl stop mongod

Edit the default mongodb configuration 'mongod.conf' using the Vim editor.

vim /etc/mongod.conf

Change the DB storage path to your own directory. We will use
'/data/db1' for the first server, and '/data/db2' directory for the
second config server.

storage:
dbPath: /data/db1

Change the value of the line 'bindIP' to your internal network addres
- 'configsvr1' with IP address 10.0.15.31, and the second server with
10.0.15.32.

bindIP: 10.0.15.31

On the replication section, set a replication name.

replication:
replSetName: "replconfig01"

And under sharding section, define a role of the instances. We will use these two instances as 'configsvr'.

sharding:
clusterRole: configsvr

Save and exit.
Next, we must create a new directory for MongoDB data, and then change the owner of that directory to 'mongod' user.

mkdir -p /data/db1chown -R mongod:mongod /data/db1

After this, start the mongod service with the command below.

mongod --config /etc/mongod.conf

You can use the netstat command to check whether or not the mongod service is running on port 27017.

netstat -plntu

Configsvr1 and Configsvr2 are ready for the replica set. Connect to the 'configsvr1' server and access the mongo shell.

ssh root@configsvr1mongo --host configsvr1 --port 27017

Initiate the replica set name with all configsvr member using the query below.

If there is no error, you will see results as below.
Results from shardsvr3 and shardsvr4 with replica set name 'shardreplica02'.

Redo this step for shardsvr3 and shardsvr4 servers with different replica set name 'shardreplica02'.
Now we've created 2 replica sets - 'shardreplica01' and 'shardreplica02' - as the shard.

Step 5 - Configure mongos/Query Router

The 'Query Router' or mongos is just instances that run 'mongos'. You
can run mongos with the configuration file, or run with just a command
line.
Login to the mongos server and stop the MongoDB service.

ssh root@mongos systemctl stop mongod

Run mongos with the command line as shown below.

mongos --configdb "replconfig01/configsvr1:27017,configsvr2:27017"

Use the '--configdb' option to define the config server. If you are on production, use at least 3 config servers.
You should see results similar to the following.

You will see sharding status similar to the way what the following screenshot shows.

We have 2 shard replica set and 1 mongos instance running on our stack.

Step 7 - Testing

To test the setup, access the mongos server mongo shell.

ssh root@mongosmongo --host mongos --port 27017

Enable Sharding for a Database
Create a new database and enable sharding for the new database.

use lempsh.enableSharding("lemp")sh.status()

Now see the status of the database, it's has been partitioned to the replica set 'shardreplica01'.Enable Sharding for Collections
Next, add new collections to the database with sharding support. We
will add new collection named 'stack' with shard collection 'name', and
then see database and collections status.

sh.shardCollection("lemp.stack", {"name":1})sh.status()

New collections 'stack' with shard collection 'name' has been added.Add documents to the collections 'stack'.
Now insert the documents to the collections. When we add documents to
the collection on sharded cluster, we must include the 'shard key'.
In the example below, we are using shard key 'name', as we added when enabling sharding for collections.