Sitecore MongoDB Sharded vs. Replica Set

By: Antonios Giannopoulos, Grant Killian

Posted on: May 25, 2017

MongoDB is a technology that’s easy to overlook in a Sitecore implementation. Many people come to Sitecore already knowing SQL Server, and C# applications running on IIS are as familiar as well broken-in blue jeans. As Sitecore systems evolve, however, they move into interesting areas that are somewhat off the beaten path for the typical ASP.Net developer. One gets into areas like Solr, Redis, or – the focus of this write-up – MongoDB.

There can be a lot of moving parts to MongoDB, and it can play by a different set of rules. This is the first piece in a series we’re calling the "ObjectRocket MongoDB Deep Dive for Sitecore." We’ll explore the rules MongoDB plays by when it comes to Sitecore and share some of the patterns we’ve found successful at ObjectRocket and Rackspace. Let’s start with a look at the two major methods of running MongoDB: a MongoDB “replica set” and a MongoDB “sharded cluster.”

Replica Set

A MongoDB “replica set” is a straight-forward approach to running MongoDB. One achieves redundancy, high availability, and scaling by having multiple mongod processes maintaining the same dataset. This is canonically shown as a 3 node “replica set” in a diagram like this:

For Sitecore, the ConnectionStrings.config file can point to the primary MongoDB server (or there are other options spelled out in the MongoDB Connection String URI Format documentation).

MongoDB nodes can be considered as a Primary, Secondary, or an Arbiter role for MongoDB. Secondaries pull data from the Primary asynchronously and are kept consistent with the primary after a short delay (eventually consistent for the CAP theorem folks out there). There is automatic failover where the MongoDB primary can be replaced by a secondary in the event of an emergency.

A “replica set” requires at least 3 nodes and can have up to 50 nodes in MongoDB version 3.0 and higher (prior to MongoDB version 3.0, it was capped at 12 nodes).

Common best-practices for a “replica set” include keeping an odd number of nodes, consistent server specifications for each node, and reliable network connectivity. It’s also key to monitor a “replica set” handling a production workload. When configured well, a “replica set” can be a great fit for small-to-medium sized Sitecore solutions. In monitoring your “replica set,” if you find a lot of server memory is occupied with the working set (50% or more, consistently, for a rule of thumb?), or performance problems developing due to writes to the single primary node being limited by IO, you could be outgrowing the “replica set” model.

Sharded Cluster

Heavier MongoDB workloads run in a “sharded cluster” implementation instead of a single “replica set.” A “sharded cluster” consists of mongos (pronounced “mongo s”) serving as a query statement router that is the connection interface to the data stored in MongoDB; the mongos nodes make sharding transparent. A “sharded cluster” also includes config servers that store metadata for the cluster, and the shard servers that contain the actual data (each shard is usually a subset of all the data).

Here’s a visual showing how these pieces fit together:

For Sitecore, the ConnectionStrings.config file can point to an endpoint where the mongos are listening and then your work is complete in terms of Sitecore-specific configuration and setup.

Of crucial importance to a “sharded cluster” is to pick a shard key that helps MongoDB organize the data most effectively. Altering shard keys can require downtime, so it’s important to get it set right from the start. For Sitecore, our work at ObjectRocket with numerous large Rackspace customers has shown us that a good shard key for Sitecore is _id:1 or _id:hashed. I’ll save the full rationale for a subsequent write-up since it can be a long explanation, but for now I’ll summarize by saying that the _id field in Sitecore is an application generated GUID that satisfies a lot of the criteria for being a good MongoDB shard key.

In closing, if one has a sophisticated Sitecore implementation, and managing the various MongoDB aspects is intimidating, talk with ObjectRocket about our hosted MongoDB as a service. Many high-profile projects put the deep MongoDB expertise at ObjectRocket to use for their implementation and rely on it as a winning partnership for managing the persistence of Sitecore xDB and beyond.