Mongo Mailbag: Master/Slave Configuration

Trying something new: each week, I’ll take an interesting question from the MongoDB mailing list and answer it in more depth. Some of the replies on the list are a bit short, given that the developers are trying to, you know, develop (as well as answer over a thousand questions a month). So, I’m going to grab some interesting ones and flesh things out a bit more.

Hi all,

Assume I have a Mongo master and 2 mongo slaves. Using PHP, how do I do it so that writes goes to the master while reads are spread across the slaves (+maybe the master)?

1) 1 connect to all 3 nodes in one go, PHP/Mongo handles all the rest
2) 1 connect to the master for writes. Another connection to connect to all slave nodes and read from them.

Thanks all and sorry for the noobiness!

-Mr. Google

Basics first: what is master/slave?

One database server (the “master”) is in charge and can do anything. A bunch of other database servers keep copies of all the data that’s been written to the master and can optionally be queried (these are the “slaves”). Slaves cannot be written to directly, they are just copies of the master database. Setting up a master and slaves allows you to scale reads nicely because you can just keep adding slaves to increase your read capacity. Slaves also make great backup machines. If your master explodes, you’ll have a copy of your data safe and sound on the slave.

So, how do you set up Mongo in a master/slave configuration? Assuming you’ve downloaded MongoDB from mongodb.org, you can start a master and slave by cutting and pasting the following lines into your shell:

(I’m assuming you’re running *NIX. The commands for Windows are similar, but I don’t want to encourage that sort of thing).

What are these lines doing?

First, we’re making directories to keep the database in (~/dbs/master and ~/dbs/slave).

Now we start the master, specifying that it should put its files in the ~/dbs/master directory and its log in the ~/dbs/master.log file. So, now we have a master running on localhost:27017.

Next, we start the slave. It needs to listen on a different port than the master since they’re on the same machine, so we’ll choose 27018. It will store its files in ~/db/slave and its logs in ~/dbs/slave.log. The most important part is letting it know who’s boss: the –source localhost:27017 option lets it know that the master it should be reading from is at localhost:27017.

There are tons of possible master/slave configurations. Some examples:

You could have a dozen slave boxes where you want to distribute the reads evenly across them all.

You might have one wimpy little slave machine that you don’t want any reads to go to, you just use it for backup.

You might have the most powerful server in the world as your master machine and you want it to handle both reads and writes… unless you’re getting more than 1,000 requests per second, in which case you want some of them to spill over to your slaves.

In short, Mongo can’t automatically configure your application to take advantage of your master-slave setup. Sorry. You’ll have to do this yourself. (Edit: the Python driver actually does handle case 1 for you, see Mike’s comment.)

However, it’s not too complicated, especially for what MG wants to do. MG is using 3 servers: a master and two slaves, so we need three connections: one to the master and one to each slave. Assuming he’s got the master at master.example.com and the slaves at slave1.example.com and slave2.example.com, he can create the connections with:

This next bit is a little nasty and it would be cool if someone made a framework to do it (hint hint). What we want to do is abstract the master-slave logic into a separate layer, so the application talks to the master slave logic which talks to the driver. I’m lazy, though, so I’ll just extend the MongoCollection class and add some master-slave logic. Then, if a person creates a MongoMSCollection from their $master connection, they can add their slaves and use the collection as though it were a normal MongoCollection. Meanwhile, MongoMSCollection will evenly distribute reads amongst the slaves.

Now we can use $c like a normal MongoCollection. MongoMSCollection::find will alternate between the two slaves and all of the other operations (inserts, updates, and removes) will be done on the master. If MG wants to have the master handle reads, too, he can just add it to the $slaves array (which might be better named the $reader array, now):

Alternatively, he could change the logic in the MongoMSCollection::find method.

Edit: as of version 1.4.0, slaveOkay is not neccessary for reading from slaves. slaveOkay should be used if you are using replica sets, not –master and –slave. Thus, the next section doesn’t really apply anymore to normal master/slave.

The only tricky thing about Mongo’s implementation of master/slave is that, by default, a slave isn’t even readable, it’s just a way of doing backup for the master database. If you actually want to read off of a slave, you have to set a flag on your query, called “slaveOkay”. Instead of saying:

$cursor = $slave->foo->bar->find();

we have:

$cursor = $slave->foo->bar->find()->slaveOkay();

Or, because this is a pain in the ass to set for every query (and almost impossible to do for findOnes unless you know the internals) you can set a static variable on MongoCursor that will hold for all of your queries:

MongoCursor::$slaveOkay = true;

And now you will be allowed to query your slave normally, without calling slaveOkay() on each cursor.

We use wrapper libraries on all of our data connections which allow us to handle all of the connection management outside of the regular code. For instance in my MySQL setup we have configuration files (Dumpers) of hashes of info broken up by site/role with the settings for each server (ip, user, database name, master, slave, etc).

Then when a piece of application code needs to do something we example the query and then either direct you to a slave or to a master, but we never make a connection until we’re ready to actually do something.

Also this allows us to have different “connect” methods on our wrappers. For instance “ConnectMaster” and “ConnectSlave” or “ConnectMultiple”. The wrapper also then takes care of things like if you do updates or inserts automatically gets the number of rows affected of the last id generated.

Basically under high loads of millions of requests it’s too costly to constantly open connections to probe your network to figure out what it does, This does mean that we have to be more on top of our configurations though. So minor management overhead traded for much less runtime overhead. 🙂

We use wrapper libraries on all of our data connections which allow us to handle all of the connection management outside of the regular code. For instance in my MySQL setup we have configuration files (Dumpers) of hashes of info broken up by site/role with the settings for each server (ip, user, database name, master, slave, etc).

Then when a piece of application code needs to do something we example the query and then either direct you to a slave or to a master, but we never make a connection until we’re ready to actually do something.

Also this allows us to have different “connect” methods on our wrappers. For instance “ConnectMaster” and “ConnectSlave” or “ConnectMultiple”. The wrapper also then takes care of things like if you do updates or inserts automatically gets the number of rows affected of the last id generated.

Basically under high loads of millions of requests it’s too costly to constantly open connections to probe your network to figure out what it does, This does mean that we have to be more on top of our configurations though. So minor management overhead traded for much less runtime overhead. 🙂