The project I’m working on at the moment has a lot of analytics data. This means there’s a lot of inserts and updates in the database and queries have to be fast! At the moment all of this is hosted in a single MS SQL Server which does a pretty decent job. Still, this seems like a perfect scenario to introduce a noSQL database, especially as we are migrating to the cloud to improve performance of the application as a whole.

After a having reviewed a couple of noSQL databases, our weapon of choice became MongoDB. It’s popular, fast and there are a couple of providers which offer it as a hosted solution. While reviewing MongoDB I discovered you not only need to change your ‘scheme’, but also the way you query the database. I have spent quite some hours on finding out why the performance was so incredibly slow, therefore it seems like a good idea to share my findings.

To make life easier, the most important stuff to set up a connection to a MongoDB store is thread safe. This is great as you don’t have to setup a connection to MongoDB every time you want to use it. In my testing code I’ve set up some static properties to keep the connection alive.

Keep in mind, all the code shown in this post is used for testing. Don’t use it in a real production scenario as it still needs a lot of tweaking and tuning.

Coming from a traditional SQL background, I figured it would be a good idea to search for a Collection and do some query on it. With a little help from the MongoDB documentation I figured the method should look a bit like this:

Based on the information I could find in the documentation. Seems like the correct way to query a collection, right?

Wrong!

Using this code will give you terrible performance results when you execute it (like when you do with .ToList()). I have written multiple tests using this code and it was much slower as the queries we had defined for SQL. This is strange, as the SQL queries were quite complex and the noSQL query was rather simple.

Some figures, the SQL test took about 14302 milliseconds to execute (ran the query 1000 times). Using the above code to query MongoDB took about 144604 milliseconds. That’s 10 times slower! It’s possible my document structure isn’t optimal, but that wouldn’t explain such a big difference in results. Something had to be off.

Having spent several hours to discover my error I finally found a different way to query MongoDB. Creating a DataContext-wannabe class appears to be the solution to the performance problems I was facing. Creating such a class is easy, just create a bunch of properties which look like this:

Every property corresponds to a collection which is and you want to use in the code. Because every property is an IQueryble<T>, you can do LINQ queries on these properties, so I have changed my testing code to use these properties. The Query<T> method could now be implemented to something like this:

mongoDal.SomeData.Where(s => someIdCollection.Contains(s.SomeId)

Keep in mind you can only use LINQ queries which are supported by the MongoDB driver. I discovered the .GroupBy() method doesn’t work for this piece of code.

After having implemented this all over my testing code, I ran the MongoDB test again. The results were staggering! The test now only took 5916 milliseconds. That’s about 50% faster compared to the MS SQL test.

Keep in mind, I haven’t changed anything to the MongoDB store. I just changed the way I’m searching through the collections. Apparently it’s not really efficient to query though a MongoCursor. Using a MongoQueryable is probably the best way to do queries in a collection. I have stepped through the MongoDB C# driver code a bit and discovered when returning the results of a MongoCursor, it’s waiting for server responses most of the time. I haven’t stepped through the MongoQueryable code (yet), but it’s probably handling data retrieval in a different way.