This article is about the performance argument for MongoDB over SQL Server (or MySql or Oracle). In the first article, I threw out a potentially controversial graph showing MongoDB performing 100 *times* better than SQL Server for inserts.

We’ll see source code, downloadable and executable examples and you can verify all of this for yourselves. But first, here’s a new twist on an old proverb:

“Data is money”

If your application is data intensive and stores lots of data, queries lots of data, and generally lives and breathes by its data, then you’d better do that efficiently or have resources (i.e. money) to burn.

Let’s imagine you’re creating a website that is for-pay and data intensive. If you were to attempt to plan out your operating costs per user to help guide the pricing of your product then the cost of storing, querying, and managing your data will likely be a significant part of that calculation.

If there is a database that is 100 times faster than SQL Server, free, easy to administer and you program it with LINQ just as you would with SQL Server then that is a very compelling choice.

When you have such a database, it means you can run your system on commodity hardware rather than high-end servers. It means you can have fewer servers to maintain and purchase or lease. It means you can charge a lot less per user of your application and get the same revenue. Think about it.

“It means you can charge a lot less per user of your application and get the same revenue. Think about it.”

One more story before we see the statistics. Kristina Chodorow from 10Gen gave a talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with MongoDB”. You can watch the recording here:

[The audio and video isn’t too hot, but the content is. Skip the first minute without audio.]

During this talk, Kristina describes SourceForge’s experience moving from MySql to MongoDB. On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.

Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats, and some code.

The scenario:

Model a data intensive web application aiming to support as many concurrent users as possible. There will be users from the web application itself. But there will also be users from an API and external applications. Users will interact with the data by having nearly as many inserts as they do queries. Their inserts are all small pieces of data and are all independent of each other.

Let me just get this out of the way and I mean the following in the nicest of ways: I don’t care about your scenario or use-case. The scenario above is what I’m trying to model. I’m not trying to do bulk-inserts or loading large files into databases or anything like that. MongoDB may be great for these. SQL Server may have specialized features around your use-case, etc. They don’t apply in my scenario. So please don’t wonder why I’m not using bulk inserts or anything like that in the examples below.

Insert Speed Comparison

It’s the inserts where the differences are most obvious between MongoDB and SQL Server.

These inserts were performed by inserting 50,000 independent objects using NoRM for MongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:

MongoDB basic class

SQL Server basic class

I ran five concurrent clients hammering the databases with inserts. Here’s the screenshots for running against MongoDB and against SQL Server. Let’s zoom into the most important result with the output from one of five concurrent clients:

MongoDB:

SQL Server:

That’s right. It’s 2 seconds verses 3 1/2 minutes!

Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the inserts. After discussing these results with some friends, I re-ran the tests using raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL. That still leaves MongoDB 30x-50x faster than SQL.

Query Speed Comparison

Now let’s see about getting the data out using the same objects above on the indexed Id field for each database.

Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.

“If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.”

Complex Data and the Real World

Feel like that was an overly simplified example? Here’s some real world data with foreign keys and joins. Below is the complex data model.

MongoDB:

SQL Server:

It shouldn’t surprise you that MongoDB does even better here without its joins.

The Hardware

All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core 2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8

Your Turn

If you want to see the entire set of data above as an Excel spreadsheet, you can download that here:

You can also download the sample code. Before you do, realize I haven’t done a bunch of work to make it super easy to run. But you should be able to figure it out. Just turn the knobs on the PerfConstants class for the number of inserts and queries. Then comment or uncomment sections of the code in the clients for your scenarios.

The expected use is that you’ll start the launcher application then use it to launch five concurrent clients at exactly the same time.

That is a good point and something I overlooked when putting the sample together. The most important take-away from the sample is this:

SQL requires joins, joins are slow. MongoDB is fast in large part because it doesn’t use joins (most of the time).

That it doesn’t correctly match Customers/Orders isn’t that important.

I guess in some sense you could say it’s not a fair comparison. However, we’ve certainly other tests with different data models and it turns out the performance of SQL is way off of that of MongoDB, so I haven’t re-written it.

I’m considering putting together another post using EC2 machines so we can all test exact platform. I’ll update the sample if I do so.

SQL JOINS are not slow if you use an appropriate key. GUIDS will work. However (as implemented in your sample code) they are not monotonically increasing. Therefore inserts alone will be slow due to page splits and fragmentation. Then, because they are somewhat random, the index becomes more of a scan than an seek. Therefore it’s not an apples to apples comparison.

Hi
I am using both MYSQL and MONGODB in my application for retriving data, I am using java for the development of my application,
I have a collection name as tesing_mongodb in Mongodb contains 2,00,000 documents and i have a table in mySql named as testing_sql contains same records as in testing_mongodb, means table testing_sql contains 2,00,000 rows as well and both contains exactly the same data,

I created 2 different pages to retrieve records, one for connection with mongodb and one for MySql,
And i retrieved all the data from Mongodb Collection testing_mongodb on 1 page and i retrieved all the data from MYSQL table testing_sql on another page,
the response time from mongodb Collection to fetch result is around or more than 1 minute and
the response time from MYSQL is around 3 to 4 second.
Now, I heard that Mondodb is very fast in compare with MYSQL but I found this statement is wrong, can any one tell me the reason,
I am using latest version of mongodb with single server,
I am using windows xp service pack 2 and my ram is just 2 GB
and I am running MYSQL on this same system

Check you have 64 bit machine on which Mongodb is running.
Have at least 4-8 GB RAM.
Check where clause is indexed.
Pretty much it… Mongodb will throw MySQL out of the roof…
We have achieved 7k decent shopping cart objects and inserted up to 1 Million records and queried random objects and were able to retrieve that shopping cart in < 2 milliseconds.

That is a good point. MongoDB is pretty hungry for memory. However, you can get the m1.medium instance on EC2 for around $90/month or so and that has 3.7 GB. Probably a pretty good host. If you host MongoDB in Linux (still accessible via Windows instance) you could cut that in half or double the RAM.

Michael, Are these results using local(client and server in same machine) MongoDB or remote(client and DB Sever in seperate machines) because results with remote MongoDB are like SQL Server took 27secs for insertion of 25000 records and MongoDB took 40 secs for the same.

All queries are against primary or foreign-key relationships both in SQL & MongoDB. I ran the tests a few times to “warm up” the servers before doing the timing. I recorded just one run (not the average) for the numbers, but running them repeatedly generated roughly the same numbers.

MongoDB relies on the system pager to write the memory mapped file to disk and will return as soon as it has updated the memory — before the OS has finished syncing the memory mapped file. SQL Server returns after the data has been at least reliably logged. Your faster MongoDB is exposing you to the possibility of data loss.

That is partly why MongoDB is fast (the other is more coherent data structures). However, you’re not looking at the whole picture when making this comparison.

One instance of MongoDB != One instance of SQL Server

MongoDB is build to be used in replica sets: http://www.mongodb.org/display/DOCS/Replica+Sets. The recommendation is to have at least one slave, one master. This configuration is potentially provides more data safety than a single SQL Server.

Here’s an example connectoin string that will not return from the write until replication has succeeded on at least two machines out of 3 in a replica set:

Now that I have an Eiffel driver for MongoDB, I will try my own comparison at the end of next week. In the middle of the week, we are getting new PCI-e SSD cards, which are much faster than their SSD SATA counterparts and certainly faster than a typical SATA HDD. With the high-end PCI-e SSD cards, I am expecting very big performance increases and will want to see how SQL Server stacks up in this arena.

We started as a SQL Server shop, but will someday make the cut over to MongoDB if I can convince the power-that-be about the real-world business Use Case and money difs between the two. For instance, our DBA team has spent almost a month making changes to a SQL Server DB that could have been literally swallowed in two days by coding it directly and letting the DB just have it! At about an average of $75/hour for DBA+AppDev labor for 3 people, we have spent about $20K+! It is time for that kind of expenditure to bite the dust!

Thanks for your hard work. I will remember this site and your conclusions as a ready-reference.

Is it about lots of small rights and lots of small reads?
I work with financial data, my queries retrieve 1/2gb at a time or compare two columns each of which is 1/2gb of data. Does noSQl perform better in doing comparisons on in generating calculated fields from existing data? Again is noSQL better for transactions similar to amazon.com purchases or searches which are lots and lots “small” searches, and no calculated fields generated?

Hi, I’ve read this article, and, I want to give my own opinion.
In fact, I’ve found that the performance difference between MongoDB and Sql Server is more represents the difference between MongoDB’s client driver and Sql Server’s client driver(ADO.NET).
I once made a test, I emit ADO.NET’s code and ignore some useless ILCode, then I run such query over SqlServer, our test finished about 80,000 query operation in one second(Platform:I7, 12GB MEM, WIN7).

MongoDB *is* somewhat transactional as long as you are only working with one document. In many cases, this is the same as standard RDBMSes. I update a customer, their address, their orders, etc. You could do that transactionally / atomically without trouble using MongoDB provided your schema is setup correctly.

You cannot update multiple documents atomically, true. But for many apps, there are ways around this (e.g. compensation, optimistic concurrency – a la Entity Framework, etc).

Wordnik stores its entire text corpus in MongoDB – 3.5T of data in 20 billion records. The speed to query the corpus was cut to 1/4 the time it took prior to migrating to MongoDB. More about MongoDB at Wordnik…

the database setup you used, isn’t ideal for SQL Server. Having GUIDs as clustered indexes is probably causing a lot of IO on your system because the inserts happen basically randomly across the database pages and cause page splits.

With that in mind, I modified your database script to include a RowId column for each table. I made it a bigint identity column and the clustering index. The primary key was kept on your GUID ID columns.

Running 10000 inserts with the ADO.NET style client on my Lenovo W530 with SQL Server 2012 ran between 2.104 seconds and 2.174 seconds. Quite different from the 204 seconds you initially reported and very close to the MongoDB results.

I am pretty sure it will be similar to the guy above who talked about how they needed a team of DBA’s to work over an entire db. Imagine having to spend $20k for a month just to get something to perform decently… sounds like maybe the wrong tool is being used if you ask me.

I think the big draw is the fact its easier to get good results without being a MS-Sql dba or expert, And on top of that its nice that your data store can exist on more than just a windows box.

So I am not sure if anyone is aware that the cost of Sql Server license is not cheap, neither is windows VMs or windows Server licenses, then lets say you multiply that cost by 3 because you want some redundancy or load balancing…. Linux + mongo could provide equivalent or better performance with a greatly reduced cost is what I am taking away from this.

You’re telling everyone that it took over 3 minutes to insert 10,000 rows of data which contain 3 columns? If that was the case Sql Server wouldn’t still be around. Seems like mongoDB is for developers who could care less about understanding a relational model, stored procedures, and security, and don’t want to have to go through dbas to push their non performing code.