Introduction

This article attempts to highlight the latest developments in both the Mongo open-source document database and the open-source official C# driver and to supplement the previous
reviews on CodeProject in the light of these improvements.

Overview of Document Databases.

Document databases store information relating to a record in a contiguous blob of data known as a document . A document’s structure usually follows the
JSON format and consists of a series of key-value pairs. Unlike the schema of relational databases, the document’s structure does not reference empty fields. This flexible arrangement allows fields to be added and removed with ease. What’s more, there is no need to rummage about in various tables when trying to assemble the data; it’s all there in one solid block.
The downside of all this is that Document databases tend to be meaty. But, now that disk drives are in the bargain basement, the trade off between speed of access and storage costs has shifted in favour of speed and that has given rise to the increased use of document databases. The Large Hadron Collider at Cern uses a document database
but that's not why it keeps breaking down.

Hosted Web Server for MongoDb.

There is free web hosting of MongoDb at
MongoHQ . The sandbox database
plan provides 512mb of storage and is a good way to test drive the database.
There is no need to download the mongoDb binaries and the web site’s user
interface provide allows administrative tasks to be carried out. Just sign up,
down load the
driver and you’re cooking with gas. I’ve used this service, it
seems to be genuinely free and there is no badgering to upgrade.

Desktop Installation of MongoDb and the C# Driver

All you need to get started is on the Mongodb
website.Installation instructions are well documented
although you might have to wade through some detritus to get the correct set for
your system. You also need to download the C# .Net driver as well as the MongoDb
binaries. The C# Driver consists of two libraries: the BSON Library, MongoDB.Bson.dll, and the C# Driver, MongoDB.Driver.dll. There is a basic user interface situated 1000 ports above the port the database listens on.
For the default installation, this is at
http://localhost:28017/. You need to have the line 'Set rest=true' in the mongod.cfg file to enable
this interface. There are also more sophisticated open-source applications available for carrying out administration tasks on the database.

The Database Structure

The basic structure for storing data fields is the BsonElement. It’s a simple
KeyValue pair . The Key contains a field name and the Value its value. The Value can itself be a BsonElement, so they can be nested, Russian doll style. Records are stored as documents. The Document is a collection of BsonElements.
Here is an example document.

Every record does not need to contain every field. The only required field is the _id and fields can be added at a future date without having to change the existing records. In this example, the Cars field is an array. Its
Value field contains a nested Document. The elements in the nested Document are
KeyValue pairs. The key is the array index number and the value is the name of the car.

The C# driver.

The driver is used to interface your code to a Mongo database. The driver can serialize data classes to the database without
the need for special attributes. All that's required is a unique Id. This is usually of type BSON.ObjectId, a 12 byte time-stamped value which is automatically assigned by Mongodb . You can use a GUID instead but it needs to be mapped to a string.
The reason for this is that a GUID is usually stored as a binary and the driver’s Aggregation Framework has problems digesting binary data. I get the same sort of trouble with cucumber sandwiches.

Connecting to the database.

The first requirement is to have a connection string. If you plan to use hosted version, you need to sign up to mongoHQ with a username and password, create a database and register yourself as a new user for the database. Make a note of the login string provided , it will look something like.

These calls will fail on the hosted site if the test database does not exist or you are not a registered user of the database. This is because new databases must be created in admin mode on the hosted web site. These constraints do not apply to the desktop server -it will go ahead and create a new database called ‘test’ if it does not already exist.

Accessing collections.

Documents with a similar structure are arranged as named collections of data in the database. The driver has a Collection object that acts as a proxy for a database’s collection. The following code shows how to access and enumerate a collection,
named 'entities', of type ClubMember.

//Builds new Collection if 'entities' is not found
MongoCollection<ClubMember> collection = database.GetCollection<ClubMember>("entities");
Console.WriteLine("List of ClubMembers in collection ...");
MongoCursor<ClubMember> members = collection.FindAll();
foreach (ClubMember clubMember in members)
{
clubMember.PrintDetailsToScreen();
}

It’s recommended that the foreach method is used wherever possible as it cleans up after itself. Boring housekeeping duties such as calling AttachDatabase(),DropDatabase() are a thing of the past.
You should avoid calling DropDatabase() as it closes down the database's
connection pool.

Indexes.

MongoDB indexes use a B-tree
data structure. All queries only use one index and a query optimiser chooses the most appropriate index for the task. The following code builds an index to sort data based on the Lastname property then by the Forename sorted A-Z and finally by the Age property, oldest to youngest.

//Build an index if it is not already built
IndexKeysBuilder keys = IndexKeys.Ascending("Lastname", "Forename").Descending("Age");
//Add an optional name- useful for admin
IndexOptionsBuilder options = IndexOptions.SetName("myIndex");
//This locks the database while the index is being built
collection.EnsureIndex(keys, options);

This index is great for searching on Lastname or Lastname, Forename or Lastname, Forename, Age. It is not useful for sorting on Forename or Age or any combination of the two. The default behaviour is for indexes to be updated when the data is saved as this helps to prevent concurrency problems. But there is still a potential problem if newly written data is immediately read back. The way round this is to ensure that the write and read operations are performed on the same thread by enclosing the operations within the following.

using (server.RequestStart(database))
{
}

Querying Data Using Linq.

This is done by referencing the Collection’s AsQueryable method before writing the Linq statements All the usual methods are available. Here are a few examples

Querying Data Using The QueryBuilder Class.

Using the query builder classes is not as exciting as writing Linq, you don’t get the opportunity to put lots of arrows in your code, but there are still some methods that are worth highlighting.

DateTime membershipDate = DateTime.Now.AddYears(-5);
//DateTime is stored in the BsonElement as a UTC value so need to convert
DateTime membershipDateUTC = membershipDate.ToUniversalTime();
//Query.GT implements a 'greater than' query.
// The parameters are a field name and its Value
MongoCursor<ClubMember> recentMembers =
collection.Find(Query.GT("MembershipDate", membershipDateUTC));
Console.WriteLine("Members who have joined in the last 5 years ...");
foreach (ClubMember clubMember in recentMembers)
{
clubMember.PrintDetailsToScreen();
}

There are methods to carry out most of the common sorts of comparisons.The Query.And
method does a logical AND on successive Query objects. The next bit of code
illustrates this by finding all members called David Jones and then updating the
Forename to Dai. The Update.Set() method sets the Forename field
to its new value on all documents selected. Finally , Collection.Update performs the update on the server side.

Querying Data Using Map Reduce.

MapReduce is a heavy-duty method used for batch processing large amounts of data. There are two main parts to it. A map function that associates a field with a value and a reduce function that reduces the input values to a single output. There is an example using Map Reduce in the sample code as it may come in handy but for most users the Aggregation Framework is a better way of collating data.

Querying Data Using The Aggregation Framework.

The Aggregation Framework is used to collect and collate data from various documents in the database. It’s new in version 2.2 and is an attempt to bring the functionality of SQL to a document database.
The aggregation is achieved by passing a collection along a pipeline where various pipeline operations are performed consecutively to produce a result. It’s an oven-ready chicken type production line -there is less product at the end but it is more fit for purpose.
Aggregation is performed by calling the Collection’s Aggregate method with an array of documents that detail various pipeline operations.

Aggregation Example.

In this example there is a document database collection consisting of the members of a vintage car club. Each document is a serialized version of the following ClubMember Class

The ClubMember Class has an array named Cars that holds the names of the vintage cars owned by the member. The aim of the aggregation is to produce a list of owners
who have joined in the last five years for each type of car in the collection.

Step 1 Match Operation.

The match operation selects only the members that have joined in the last five years.
Here's the code.

As you can see, the code ends up with more braces than an orthodontist but at least itelliSense assists when you are writing it. The keyword $gte
indicates a greater than or equal query.

Step 2 Unwind Operation.

Unwind operations modify documents that contain a specified Array. For each element within the array a document identical to the original is created. The value of the array field is then changed to be equal to that of the single element. So a document with the following structure

Step3 Group Operation.

Define an operation to group the documents by car type. Each consecutive operation does not act on the original documents but the documents produced by the previous operation. The only fields available are those present as a result of the previous
pipeline operation. You can not go back and pinch a field from the original documents.
The $ sign is used in two ways. Firstly, to indicate a keyword and, secondly, to differentiate field
names from field values. For example, Age is a field name, $Age is the value of the Age
field.

var groupByCarTypeOperation = new BsonDocument
{
{
//Sort the documents into groups
"$group",
new BsonDocument
{
//Make the unique identifier for the group a BSON element consisting
// of a field named Car.
// Set its value to that of the Cars field
// The Cars field is nolonger an array because it has now been unwound
{ "_id", new BsonDocument { { "Car", "$Cars" } } },
{
//Add a field named Owners
"Owners",
new BsonDocument
{
{
//add a value to the Owners field if it does not
//already contain an identical value.
//This makes the field Value an array
"$addToSet",
//The value to add is a BsonDocument with an identical structure to
// a serialized ClubMember class.
new BsonDocument
{
{ "_id", "$_id" },
{ "Lastname", "$Lastname" },
{ "Forename", "$Forename" },
{ "Age", "$Age" },
{"MembershipDate","$MembershipDate"}
}
}
}
}
}
}
};

Step 4 Project Operation.

The _id field resulting from the previous operation is a BsonElement consisting of both the field name and its Value. It would be better to drop the field name and just use the Value. The following Project operation does that.

Step 6 Run the Aggregation and output the result.

The AggregateResult class returned has a bool field named Ok. It is set to true if there were no errors. The resulting documents are returned in the AggregateResult.ResultDocuments collection. The easiest way to deserialize the collection is to call its Select method passing in the Deserialize method of the BsonSerializer as follows.

The sample application has an aggregation example that performs various calculations on the data set
such as Count, Min, Max and Total.

GridFS.

GridFS is a means of storing and retrieving files that
exceed the BsonDocument size limit of 16MB. Instead of storing a file in a
single document, GridFS divides a file into chunks and stores each of the chunks as a separate document. GridFS uses
two collections to store files. One collection stores the file chunks and the
other stores the file’s metadata. The chunk size is about 256k. The idea here is that smaller chunks of data can be
stored more efficiently and consume less memory when being processed than large files. It’s generally not a good idea to store binary data in the
main document as it takes up space that is best used by more meaningful data.
Uploading data into GridFs is straight forward. Here are a couple of examples.

conststring fullyQualifiedUpLoadName = @"C:\temp\mars.png";
//Here the uploaded file is given the name 'C:\temp\mars.png'
MongoGridFSFileInfo gridFsInfo = database.GridFS.Upload(fullyQualifiedUpLoadName);
//Here the uploaded file is given the name 'mars.png'
using (var fs = new FileStream(fullyQualifiedUpLoadName, FileMode.Open))
{
gridFsInfo= database.GridFS.Upload(fs, "mars.png");
}

The GridFS.Upload method returns an object of type MongoGridFSFileInfo. This contains the file’s metadata. Only basic details such as
the file’s name and length are included by default but the metadata can be customised to facilitate searching. Here's how.

MongoDB Replica Sets.

A replica set is a cluster of mongoDB instances that replicate amongst one
another so that they all store the same data. One server is the primary and
receives all the writes from clients. The others are secondary members and
replicate from the primary asynchronously. The clever bit is that, when a
primary goes down, one of the secondary members takes over and becomes the new
primary. This takes place totally transparently to the users and ensures
continuity of service. Replica sets have other advantages in that it is easy to
backup the data and databases with a lot of read requests can reduce the load on
the primary by reading from a secondary. You can not rely on any one instance
being the primary as the primary is determined by members of the replica set at run time.

Installing A Replica set as a Windows Service.

This example installs a replica set consisting of one primary and two secondary instances. The instances will be name MongDB0. MongoDB1, MongoDB2.
They will use IP address localhost and listen on ports 27017, 27018 and 27019
respectively. The replica set name is myReplSet.

Step 1 Housekeeping tasks.

In the mongodb folder add three new folders named rsDataDb0, rsDataDb1, rsDataDb2. These are the data folders.
Remove any instance of mongo that may be already running. In this example the service name to be removed is MongoDB.
Open a command prompt in administrator mode, navigate to where mongod.exe is installed and enter:

mongod.exe --serviceName MongoDB --remove

Step 2 Install three new service instances.

The best way to do this is to have three configuration files, one for each instance . The format of these files is very similar. Here is the congfig file for MongoDB0 . The hash sign comments out a line

#Use this to direct output to a log file instead of the console
#*******************************
logpath=C:\mongodb\log\rsDb0.log
#********************************
logappend = true
journal = true
quiet = true
#Enable this is you wish to use the user interface situated at 1000 ports above the server port
rest=true
#
# The port number the mongod server will listen on
# change port for each server instance
#**************************************
port=27017
#****************************************
# Listen on a specific ip address
# This is needed if running multiple servers.Comment out to access mongod remotely.
bind_ip=127.0.0.1
# This sets the database path, change the database path for each server instance
#**********************************************
dbpath=C:/mongodb/rsDataDb0
#*****************************************
# Keep same replica set for all servers in the set
replSet=myReplSet

The config files are included in the sample code bundle, but, basically, you change the port, dbpath and logpath for each instance. Store the config files in the bin directory and enter the following commands

Pass this variable to the rs.initiate() method by entering the command

rs.initiate(config)

You now have time to put the kettle on while Mongo takes your hard drive for a spin. When the method returns you are ready to go.
You can find out the status of your replica set by entering rs.status() in the mongo shell. To connect to the Replica set with the C# driver use this connection string.

Conclusion

There is much more to MongoDB than is detailed in this article but the hope is that there is enough information here for you to be able to begin exploring
the capabilities of this open source software. Finally, I’d like to express my gratitude to the many developers who have worked tirelessly on the
MongoDB project with little prospect of reward other than the satisfaction of having helped others. Thanks very much –I take my hat off to you.

Thanks for the question. You can find out the status of your replica set by entering rs.status() in the mongo shell and you can connect to each instance by using the port number defined in the configuration file. But you cannot rely on one particular server being the primary server as that is determined by the replica set. The system seems to be rock solid. I’ve not had any problems in over two years’ use.

I find this rational very hard to digest, why should i need to design entity classes to store flexible schema.
Its doesnt make any sense at all, i think the BsonDocument way of dealing it should be better but there is no light thrown in it, in this article and even official docs have no mention of it.

Thank you for your comments. There is no need to use an entity model, you can access a document at the BsonElement level if you choose. It’s just that employing a model facilitates data retrieval in many cases.

It seems to me that a great advantage of the flat database structure is that additional fields can be added to the model without the need to modify existing documents. In this respect, the flat database structure is both more robust and flexible than that of the conventional relational database.

Thanks for a great article. MongoDB really looks interesting, but compared to SQL Server I wonder how it compares to traditional tools in regards to backup, concurrency, user tools for making requests to the database etc.

Mongo is widely deployed commercially Soren. But I don’t think it is a suitable replacement for a SQL database in every case. It would seem to me that the greatest advantage is the clean, simple structure and the ability to add additional fields with ease. As with most open source software, you may have to hunt around to find the tools you need. They are out there but they are not well collated. Thanks and best wishes.