A lot of people have been asking questions about configuring replica sets and sharding, so here’s how to do it in nitty-gritty detail.

The Architecture

Prerequisites: if you aren’t too familiar with replica sets, see my blog post on them. The rest of this post won’t make much sense unless you know what an arbiter is. Also, you should know the basics of sharding.

Each shard should be a replica set, so we’ll need two replica sets (we’ll call them foo and bar). We want our cluster to be okay if one machine goes down or gets separated from the herd (network partition), so we’ll spread out each set among the available machines. Replica sets are color-coded and machines are imaginatively named server1-4.

Each replica set has two hosts and an arbiter. This way, if a server goes down, no functionality is lost (and there won’t be two masters on a single server).

Since we’re trying to set up a system with no single points of failure, we’ll use three configuration servers. We can have as many mongos processes as we want (one on each appserver is recommended), but we’ll start with one.

Edit: you must list all of the non-arbiter hosts in the set for now. This is very lame, because given one host, mongos really should be able to figure out everyone in the set, but for now you have to list them.

Tada! As you can see, you end up with one “foo” shard and one “bar” shard. (I actually added that functionality on Friday, so you’ll have to download a nightly to get the nice names. If you’re using an older version, your shards will have the thrilling names “shard0000” and “shard0001”.)

Now you can connect to “server4:30000” in your application and use it just like a “normal” mongod. If you want to add more mongos processes, just start them up with the same configdb parameter used above.

Post navigation

Join the Conversation

Am amazed that with install base of windows, you guys dont deem it fit to get this tutorials in windows…..Mongodb is blazing the trail I will be suprised if they do not learn from the mistake of MySQL who are just waking up to pursue windows seriously now….You feel everything is Linux…please have a change of mind…

This is my personal blog and I hate using Windows, so I don’t generally do examples with it. However, it’s not exactly hard to translate: use c:dbs… instead of ~/dbs/… and use mongod instead of ./mongod. That’s it, now it works on Windows!

I was just wondering how do config servers discover each other, so that they can keep in sync?

Does mongos tell them all, when it is passed a list of config servers? What happens if I launch another mongos with only one config server specified? And what if I launch yet another mongos and add a freshly launched config server to the list?

I was just wondering how do config servers discover each other, so that they can keep in sync?

Does mongos tell them all, when it is passed a list of config servers? What happens if I launch another mongos with only one config server specified? And what if I launch yet another mongos and add a freshly launched config server to the list?

Once they know about each other (i.e., you start one mongos listing all three) they can pass messages among themselves. If you start up a mongos with one config specified, it will contact that one and find out about the other two.

At this point, you can’t really add a “freshly launched config server” to the mix. You’re stuck with the 3 servers you originally specified, so use dynamic hostnames! (This will be configurable in later versions.)

I was under the impression that when adding a replica set as a shard, you had to specify the whole string of servers in the replica set, ie
mongos.runCommand({addshard : “foo/server1:27017,foo/server2:27017”})
has the syntax changed? and if it is the full list, do you need to add the arbiter in there as well?

I’m having a hard time understanding the real purpose of listing ALL hostnames (for a given replica set) in the shard list, ie replSetName/host1,host2,…,hostN. Isn’t it enough to just specify one of the hostname in the replica set, since the data is replicated across all hosts in the replica set?

Thanks for the quick reply! Just to be absolutely clear, your answer applies to the most recent release of MongoDB v1.6.5, right? We’re still using v1.6… that means we need to specify the entire replica set in the seed list, right? Thanks again.

Got it, thanks! One more basic question about v1.6. Conversely, if the primary host in the replica set is missing from the shard seed list, does that mean mongos will never give write operations to that shard? For example, replica setA consists of A1, A2, A3, where A1 is currently the primary. However, the shard seed only specified 2 of the 3, as in: setA/A2,A3. Please let me know if this is all documented somewhere. I just couldn’t find it… Thanks in advance.

–shardsvr is not necessary and drives me bananas. It does absolutely nothing other than change the port to 27018. I generally don’t use it (and tell people not to use it) because it’s hard enough for people to wrap their heads around sharding without making them remember extra options that don’t really do anything.

In my setup, when I talk to the mongos server, it seems like it always routes the request to the master. How can I make use of the slave to scale read performance of the shard? db.getMongo().setSlaveOk() on the mongos process doesn’t seem to have an affect on where the request is routed. Connecting directly to the slave in the shard works, but requires knowing which shard and seems to defeat the purpose of using mongos.

Hi,
I am getting a “ERROR: config servers 10.179.77.89:2000 and 10.179.79.62:2000 differ” … “configServer startup check failed” error when starting a mongos server. There is no mention of a problem with the third config server 10.179.77.195:2000. I have tried to stop 10.179.77.89:2000 and 10.179.79.62:2000, repair and restart but I still get the problem. I find that if I stop 10.179.77.89:2000 and try to start my mongos server everything works fine but I am without my config server. Any idea of how to solve this problem?

Don’t delete all your config files! It sounds like one of the config servers somehow ended up with different data than the other two. (Was anything connecting directly to that config server? Could there have been anything writing to it?) Shut down one of your other config servers, make a copy of its data directory, and put that on your messed up config server.