Articles on Java, JavaScript and Web technologies

Mongo DB compared to MySQL

I started exploring NoSQL databases recently recently. MongoDB is the first database that I look at. As I went through its features, I was wondering how it performed as compared to another popular opensource database.

So I set to do a few comparisons between various NoSQL databases against MySQL. My first comparison point is Mongo DB.

The Test Data structure

For all my tests, I have a similar structure of the data record. It models a sale transaction, where there is a unique id of the transaction, a product that is sold through a particular store on a particular date for certain quantity.

This table has an index on the TX_DATE field because I will be querying on this field later as a part of my test.

For MongoDB, the format of the JSON document was as follows:{
“txid”:, “txdate”:, “prodid”:,”qty”:,”storied”:
}
The txdate field is indexed for the MongoDB collection.

About the test measurements

All the measurement numbers are averages of at least three and many time more than three readings.

The Insert tests

In order to populate both the sources, I have a program that generates random timestamps with a range of one year (Jan to Dec),random product ids with a range of pre-selected 30 unique ids, random integer between 1 and 3 for the qty and a random storeid from a pre-selected range of 34 unique storeids.

MySql and the mongod are running on the same Linux VM. Running my test generation program in a loop, I inserted 100,000 records (for MySQL) and 100,000 documents (for MongoDB). Here are the results:

Database

Time to insert 100,000 records

MySQL

198 seconds

MongoDB with default WriteConcern

240 seconds

MongoDB with WriteConcern.UNACKNOWLEDGED

7 seconds

Clearly, the unacknowledged writes in Mongo DB are blazing fast.

The Query Tests

I did two types of queries on both the databases. The first is a fetch of records with the txdate restricted to the date range of 10th January to 15th February (My data spanned the whole year for 1st Jan to 31st Dec)
The MySQL query program was like this:

Test Results

Here the MongoDB seemed to do better for the select, but the MySQL did better for the Aggregation.

Tests with bigger data ( 1 million records/documents)

Now I wanted to increase the scale of the test. I wanted the same tests to run for 1 million records/documents. For MongoDB, I made a two shard setup where each shard was replicated. I used the prodid as the key for sharding, hoping that the records would more or less split into two halves across the two shards.

However this did not happen. It turned out that the VMs that I was using for the shards had a skew on the clocks. Once I got the clocks straightened out and I spilt the shard manually, the data got distributed evenly.

Now connecting to the “mongos” service that frontended this two shard setup, here are the comparative results:

MySQL

MongoDB (mongos)

Select Query

1393 ms

2028 ms

Aggregate Query

1085 ms

2305 ms

Here the MySQL seemed to have an edge on both fronts.

What is in Mongo that is not in MySQL (or other RDBMS)?

Other than the schema flexibility and difference in the structure of data, I see there are two key differences: Replication, Distributed database

Replication: The Mongo DB software has the capability of making replicas of the data. If a database does not have this capability, then replication has to be built on top of this. There are software and hardware products that provide this capability, but they are add ons. In MongoDB, the capability is built right into the database architecture.

Distributed database: The MongoDB architechture supports the concept of shards. Based on a pre-decided key(s)/indexes in your data, you can have the data spread across various shards which in turn can be replicated. This way, the queries can be directed to specific shards and provide scalability as the data grows to “humongous” size.