An introduction to MongoDB

10 years ago we have had no social network service in such a scale we used to see these days. Facebook, for instance, has got more than 600 million active users who modify their personal profiles, exchange messages, play games and the like every single day. Consider the total committed transactions per second caused by these interactions. Clearly, things have changed and, hence, we need to deal with those changes as well. We actually need something that accommodates itself to such huge volume of data requirements.

Do relational databases handle such use cases in a graceful manner?

The answer is yes, they do. However, the problem is actually scaling!

When a web application becomes popular and the usage goes up, the requests start queuing up, because of those locks that have been placed on different portions of the database. This means that user B has to wait for user’s A request to be processed. Under very heavy load, this becomes a bottleneck.

Such problems are usually addressed by “vertical scaling”, say, adding more memory to the hardware, replacing the current CPU with a faster one and so force. This might accommodate the current growth for a while but you’ll be faced with those conditions all over again.

Soon enough, you’ll notice that your hardware doesn’t support more than a given total amount of CPUs/RAMs whilst you have incorporated the best hard disc options available out there. So, naturally, you’ll set up the next server in a database cluster. This, however, introduces a new problem: dealing with data replication and consistency.

Now you need to make sure that data is consistent during the application operations, both under normal and failover conditions.

At the same time, you probably modify the database design and de-normalize some tables, tune some indexes and etc. This, however, could introduce new problems and new unwanted changes in the application layer.

Don’t get me wrong!

I do not say that such huge-scaled web scenarios cannot be handled by traditional database models (and in particular, the relational model) that have been around for decades.

However, the point is to understand that the relational model is simply a model. That is, it’s useful to handle certain problem domains. This certainly doesn’t close the case on other ways of representing data.

What is MongoDB?

It uses a lightweight binary format, called BSON (Binary JSON) to read/write documents (an ordered set of keys with associative values) that is roughly equivalent to a row in an RDBMS. These documents can be grouped into a schema-free set called a collection. Theses collection can be further divided conventionally into namespaces that is nothing more than different collections separated by a period.

MongoDB uses memory-mapped files to store/retrieve data. Once a file has been memory-mapped, the content could be accessed by dereferencing a pointer. This way, it pushes most of its memory management job to the OS which results in a much cleaner (and safer) code in MongoDB (it has several benefits; however, it’s out of the scope of this article). The philosophy behind it states that whenever possible, the engine has to offload processing and logic to the client side.

Each database consists of different files named .0, .1, and so forth. The first one is pre-allocated into 64MB (prefilled with zero bytes); the second one doubles this size, say, 128MB and etc up to the maximum file size of 2GB. Once this happens, each successive file will be also 2GB. This prevents file system fragmentation and provides a non-blocking mechanism to handle requests when the need for extract space arises.

MongoDB uses a lightweight TCP/IP wire protocol to expose it’s functionality to client-side programs (including drivers). It uses TCP port 27017 (by default) to listen to the connections and exposes some of its administrative features on port 28017.

Getting Started

MongoDB can be installed on Solaris, OS X, Linux and Windows (32bit/64bit). This paper, however, will focus on installing MongoDB on Microsoft Windows. You can consult this page for additional information on how to get MongoDB up and running on other operating systems.

Installing MongoDB on Microsoft Windows

Follow this URL and download the latest “Production Release” that’s suitable for your version of Windows (32bit/64bit). At the time of this writing, the latest production release of MongoDB is 1.6.5 which can be downloaded using one the following links:

Once the file is downloaded, unzip it to a folder of your choice. It contains a bin directory which contains several executables. MongoDB uses the directory “c:datadb” to store data files by default; however, you can modify it if you want. However, please note that this directory is not created by default, so launching MongoDB engine will throw an exception of “dbpath (/data/db) does not exist” and terminates the process.

Therefore, make sure that you create the folder before going on. Assuming you’ve created the “d:mongodbdb” directory, the next step is to launch the MongoDB engine using the following command-line:

mongod --dbpath "d:mongodbdb"

This starts the MongoDB engine and prints the process id, db path, port number and some other information on the console window. You can stop the currently running engine by pressing the CTRL-C on your keyboard.

MongoDB can also be installed as an NT-Service. To do so, you can enter the following command line:

Now, open up the services panel (from the Windows Control Panel), find the MongoDB service, right-click on it, and press start. If the service is started successfully, it’s logged in the dblog.log text file. (This might fail for a couple of reasons one of which is that the port# 27017/28017 is used by some other application on your system or it’s blocked by your firewall application).

However, the default port could be easily modified using the –port switch. (You can get an extensive help to mongod.exe by launching it using mongodb/? from the command line).

MongoDB Shell

The standard MongoDB distribution comes with a JavaScript interpreter (obviously, called MongoDB Shell) that is used for development/prototyping, administrative operations, lightweight scripting and testing purposes. It’s a standalone MongoDB client with built-in support for MongoDB connectivity.

Open up the console window and change the directory to the bin directory where you unzipped the MongoDB distribution. Among the executables there, it’s a program named mongo.exe. To start the shell, run the mongo.exe:

The shell automatically connects to the MongoDB instance on default ports (Make sure that your firewall software doesn’t prevent that). If you’ve installed it on a different port, you need to launch mongo.exe using the following command line syntax:

mongo 192.168.1.100:1234/test

This connects to the test database on 192.168.1.100 on 1234 port.

Understanding the basics

When the MongoDB shell starts, it automatically connects to the “test” database and the connection is then assigned to a JavaScript variable named db. This global variable is used throughout the shell, providing us with the ability to perform CRUD operations (we will examine this soon).

When you type a JavaScript variable at the MongoDB shell, it is converted to its string representation. For example, type db at the shell prompt and press the Enter key to get the default database name (in this case, “test”). You can also write your own JavaScript code. For example, enter the following code block in the MongoDB shell:

var sum = 0;
for (var i = 1; i <= 100; i++) {
sum += i;
}

Please note that the shell is enough smart to detect whether the JavaScript statement is complete. If the statement is not completed yet (and you press the Enter key), you’ll get a … prompt. The above code simply calculates the sum of the series 1-100 that is equal to 5050. Amazingly, you’ll get 5050 at prompt when you’re done with the code.

You can also load a given JavaScript file using the load command. To examine this, create a text file named “my_script.js” in the bin directory containing the following lines:

Now, from the MongoDB shell, enter the following commands to load the JavaScript file and run the getSum function respectively:

load("my_script.js");
getSum(100);

(Please note that there are other means of loading JavaScript files too, but let’s don’t get into details the purpose of this article. You can always get a built-in help by entering help in the shell prompt.)

Well, the above code produces the output 5050 as expected. Now, enter the getSum (without parenthesis) at the MongoDB prompt and hit the Enter. Amazingly, you will see the source code of the function.

With this information in hand, let’s see how to do CRUD operations against the MongoDB.

Creating documents

A document is an ordered set of keys with their associative values that’s represented as a JSON:

Well, you see a new property named _id that was not part of the original document is created automatically. The truth is that every document has a unique identity across a given collection so that the document can be uniquely identified. The default id uses the ObjectId to generate a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. For more information, please consult BSON ObjectID specification found here.

Please note that two latter documents don’t have the same exact properties Natalie’s document had. This means that the document shapes don’t matter when working with MongoDB.

Reading documents

You have already learned how to read a document using the find method. The find method returns a database cursor that can be further used to limit, sort and skip number of given results. I believe this can be learned best by some examples.

Example 1 – Find Natalie Hershlag’s document

db.actresses.find({ "name": "Natalie Hershlag" });

Example 2 – Find those actresses who have their height missing

db.actresses.find({ "height": { "$exists": false} });

Example 3 – Find those actresses who are at least 27 years old (at the time of this writing, albeit!)

db.actresses.find({ "birthday": { "$lt": new Date("1/1/1984")} });

Example 4 – Find award winning actresses of Black Swan film

db.actresses.find({ "awards": { "$in": ["Black Swan"]} });

The most important thing in the above mentioned queries is that they all return the entire document. Suppose that we want to retrieve just the actresses’ names. This is where the second parameter comes into play. For example, let’s find the actress name of the award winning Black Swan film:

Here, we ask specifically to return the name field. Please note that the _id field is always returned, regardless of what you pass in the second parameter.

You have probably noticed $in, $lt and $exists operators in the examples above. They are special tokens used to indicate different criteria. “$in” operator tests whether a value is in an array of possible matches. “$lt” operator makes sure that a value is less than a given value, and $exists checks for existence of a field. There are multiple operators all listed here.

Updating documents

In general, there are three common ways to manage concurrency in a database:

Pessimistic concurrency control: A record is unavailable to users from the time it’s fetched until it’s updated.

Optimistic concurrency control: A record is unavailable to users only when it’s being actually updated.

Last in wins: A record is simply written out, potentially overwriting any changes made by other users.

MongoDB follows the “Last in wins” model. This means that if two updates happen at the same time, the one who riches the server first will write the data. The next update will actually overwrite the previously updated document.

To update a document, you could simply use the update method. The signature of this method follows:

db.collection.update(criteria, objNew, upsert, multi);

Where criteria is a query document which locates the record to update, objNew is either the new document which describes the changes or $ operators which manipulate the object, upsert is a flag that says whether the document should be inserted if it doesn’t exist, and multi which indicates if the entire documents matching criteria should be updated.

The $push operator pushes a given value into an array.
There are several other operators all listed here. Please consult the mentioned URL to learn more.

Removing documents

You can use the remove method to remove a document from a collection. It’s gets a query document as its first parameter (just like the find method). If the query document is empty, then the entire documents will be removed from the given collection, e.g. db.actresses.remove({}) removes the entire actresses from the collection. However, if this is what you really want, it’s always better to drop the entire collection:

db.actresses.drop();

Well, I just tried to show you what the heck MongoDB is and how it works while trying not to mess with the little nifty things. There is tons of information about how to use all the features of MongoDB out there (see the references section of this article). It’s now all up to you to learn those features and give NoSQL databases (an in particular, MongoDB) a try. The rest of the article, however, is dedicated to use MongoDB in a .NET application.

Using MongoDB in a .NET application (C#)

To start, let’s download the CSharp Driver (989KB). Get the MSI version and install it to the folder of your choice. Create a console application in C#, and add both assemblies found at the installed directory to the project references, say, MongoDB.Bson.dll and MongoDB.Driver.dll. (You also need to add at least two namespaces to the .cs files: MongoDB.Bson and MongoDB.Driver).

Now let’s connect to MongoDB server and retrieve the entire actresses’ documents. To do so, we need to create a connection string as follows:

The MongoServer class is a thread-safe class that handles the database connectivity. It also uses a pool of connections to increase efficiency. The next step is to get the required database (in this example, we need to get the “test” database).

MongoDatabase testDb = server.GetDatabase("test");

The next step is to get the actresses collection. This is done using the GetCollection method of MongoDatabase class:

The above code says it all. However, it’s not what we are intended to do in real-world applications. We usually would like to deal with type-safe classes that ensemble document properties altogether. So let’s modify the above code.

All we need to do now is to insert the above object in the actresses collection:

actresses.Insert<Actress>(charlizeTheron);

And we are done. Now, there are 4 actresses in the test database and hopefully you know how to verify that.

Wrapping up

NoSQL databases are set of databases that do not use SQL. Designing a system based on such databases requires you to forget the relational model and focus on your objects. However, remember that it’s simply another database model! No more, no less.

An important thing I haven’t mentioned (on purpose) yet is that MongoDB can operate in two modes: safe and unsafe. In safe mode (which is the default behavior of MongoDB Shell), the client waits for the database response to make sure that the operation in hand is either completed or failed. This is done by calling the getLastError command right after executing a given operation. It is your responsibility to decide whether an operation can be done unsafe. It’s all about the performance.