Compose At Scale: MongoDB

You've finally hit the big time and have started getting serious traffic; now it's time to scale up. Not to worry: Compose has you covered. In this article, we'll take a look at what happens when Compose MongoDB needs to scale up. We'll also look at some design decisions you can make that will make the process smooth and easy.

Compose's mission is to conquer the data layer, and that means being able to scale up when you need your data layer to scale up. Compose has a number of features, including an auto-scaling algorithm, that makes this process pretty seamless. However, the best auto-scaling process in the world won't be able to work around inefficient algorithms and designs. In this article, we'll take a look at how Compose MongoDB scales, and cover a few of the pitfalls that can make scaling a challenge.

How does Compose Scale Up?

Compose is designed to automatically scale up as your data storage needs grow. This is done by checking the amount of space being occupied by your database every hour. If you're running low, Compose's auto-scaling system will automatically allocate an additional 1GB of storage and 102MB of RAM. This means that you never have to worry about running out of storage space as your application scales up.

However, despite this auto-scaling, there are some pitfalls to be aware of and some design considerations we can take into account to avoid these pitfalls. Let's take a look at some of these.

Pitfall 1: RAM, Aggregations, and Indices

Compose allocates a specific amount of RAM and storage space for each compute unit. This is fixed and the ratio of RAM to storage space is not adjustable. Under normal conditions, this increase is sufficient to handle the increased processing required to handle larger data sets.

One place where this can become an issue is with inefficiently-designed aggregations. The aggregations system in MongoDB performs queries and computations on data in a multi-stage pipeline. In some cases, a stage in the pipeline may be computations on every document in the data set, and it may then flow into another stage that performs a different computation on every document in the data set. These stages use RAM to perform processing at each stage and computes results to send on to the next stage. If we have a 3 stage pipeline, then for every new document we add we'll be adding 3 extra computations to our aggregation.

This potential of running through every document in the database multiple times is the crux of this first pitfall: you can accidentally design aggregations that quickly occupy more RAM than you have allocated. Since Compose's auto-scaling algorithm scales RAM usage proportionally to data storage, a situation like this can cause MongoDB to run out of memory for complex aggregations.

Another issue that can arise is through the over-use of indices. In MongoDB, each index is stored in RAM to ensure the fastest possible access to data. If you have too many fields indexed, the fields will no longer fit completely in RAM and will be swapped out to disk using a Least Recently Used (LRU) replacement algorithm. This not only slows down the indices themselves but can be deadly on performance when combined with complex aggregations.

Avoiding the Pitfall

If you're noticing a large number of memory faults in your application, you may be falling into this pit. There are two ways to avoid this pitfall: Either update your data design to limit the need for aggregations or update the aggregations to limit the amount of data being processed in each stage of the pipeline. The method you choose depends on how tolerant your application is to an update in the data schema.

If changing your data schema isn't an option, you can also modify your aggregation to handle as little data as possible. One easy way to accomplish this is by placing all of your $match stages at the beginning of your pipeline. This ensures that only the data that your aggregation is only processing data that absolutely needs to be processed.

Finally, try to keep your index within the amount of RAM you have allocated. Compose deployments have a 10:1 ratio of disk space to RAM, so an index size that's 10% of the total data storage size is a good rule of thumb.

Pitfall 2: Increased Connections

When a connection is made to Compose MongoDB, it’ll go to one of two mongos routers. There’s two of them which can act as a failover for each other. They are actually a combination of a mongos router and a HAProxy portal. The latter terminates the SSL connection and manages the IP whitelist before passing traffic on to the router.

If you have a large number of independent connections you may encounter slower connection times and occasional timeouts as your application scales to very large sizes. This can be especially apparent in distributed micro-services applications, where you can't utilize connection pooling across services.

Avoiding the Pitfall

Routers won't automatically scale, but you can scale them manually. Open the resources tab on your Compose MongoDB deployment. Under the routers section of the page, click on the scale button:

Drag the slider to add more RAM to your mongos routers, which will allow them to handle more connections and bigger queries. The more RAM a router has, the more it costs. You can also add Mongos routers if you want to spread the load.

Pitfall 3: Scaling Down Doesn't Automatically Free Storage

When Compose MongoDB users scale up their data needs, it's very rare to scale them back down. For this reason, auto-scaling does not work in the opposite direction: while auto-scaling will scale up when necessary, it will not scale down after data is deleted. This is done because scaling down reduces the amount of RAM as well as the amount of storage space to a deployment, and this can sometimes cause unexpected side-effects. As such, even though we can auto-scale up, you have to manually scale it down.

Avoiding the Pitfall

To scale back down your storage usage in Compose, you'll first need to perform a deployment resync which can be found in the settings panel of your deployment.

Then, you can go to the resources panel and scale down the amount of storage space and RAM you're using by clicking on the scale button in the database shards section.

Scaling down your deployment won't remove data that's there, but it will free up unused space until that scaled-down size is reached OR there is no more unused space.

Summary

When your business starts to scale up, there are a lot of concerns that start to arise in every layer of your application. Compose's auto-scaling features are designed to handle most of the issues that can arise in the data layer, and by avoiding the pitfalls in this article you can experience seamless scaling of your data storage and management no matter how big you get.

If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

John O'Connor is a code junky, educator, and amateur dad that loves letting the smoke out of gadgets, turning caffeine into code, and writing about it all. Love this article? Head over to John O'Connor’s author page to keep reading.