Database

Getting Started with MongoDB

Write data applications with MongoDB and you'll quickly understand the excitement around NoSQL.

Each game must have a category. Each time a game ends for a player, a service inserts the score in the PlayerScore table. Each player must have a gender (male or female) defined in the Gender table.

Now, suppose that hundreds of thousands of players are submitting scores to the database every day. If the sites that offer the games become too popular, the database might grow to a few petabytes in a short period of time (for example, imagine an incredible sudden interest on retrogaming because one of the American Idol participants is a big fan of retrogames). Scaling even this simple database and running queries with inner joins and aggregates with huge amounts of data can easily become a serious problem.

A document-oriented schema that stores players and scores requires you to make a choice. MongoDB provides you with the following two options:

Embed: Create a players collection and embed the scores in their respective player documents.

Reference: Create two collections, one for players and the other for scores. Each score will include an object id reference to the player to which the score belongs to.

Because the child objects (the scores) always appear in the context of their parent (the players), I will embed in this case. I must anticipate that it won't be the best decision because I'll explain some problems with embedding the scores as I dive deeper and make updates to this schema. I will use the embedded scores as an example of a common problem with embedded documents when you're new to MongoDB.

The scores will always appear within the player information in the application; therefore, it seems to be an excellent idea to embed the scores. However, if the application had to display the most recent scores regardless of which players generated them, then creating two collections and a reference would be the most convenient option. And, don't forget about the 16 MB maximum document size: It will become a problem for this structure when a single player has a huge number of scores.

In the relational schema, each Game was related to a single GameCategory. The category is simply required to filter games in the application, so you can think of a category as a sort of tag. In fact, each game might belong to many categories. I'll take advantage of the possibility offered by MongoDB of storing an array of strings as a value for one of the fields. This way, it is necessary to create a Games collection, where each game includes a categories field with an array of strings. As I explained earlier, the design depends on the kind of application  in other cases, it would be convenient to create a GameCategories collection and reference it within Games.

Each score belongs to a game and will include an object ID reference to the game to which the score belongs. Because the application must display the game's name for each score entry, I will cache the game's name within the score details.

The first version of the document-oriented schema will use just two collections:

Games

Players

If you execute the following lines in the MongoDB shell, MongoDB will create the retrogrames database, add the games collection to this new database, and insert a new document with an automatically generated ObjectId value for its _id field:

The following line retrieves all the documents for the games collection and is the equivalent to SELECT * FROM games:

db.games.find()

The results appear in Figure 7.

Figure 7: Retrieving all the games for the games collection in the MongoDB shell.

In this case, the games collection has just one document. However, as a general practice, it is convenient to limit the number of results. The following line retrieves the first 100 documents for the games collection and is the equivalent to the SQL Server SELECT TOP 100 * FROM games statement (see Figure 8):

db.games.find().limit(100)

The following line retrieves the first document that includes a name field with the value equal to "Invaders 2013" (see Figure 9):

db.games.findOne({ name: "Invaders 2013"})

The results are shown in Figure 8.

Figure 8: Retrieving the first document within the games collection that has the specified value for the name field in the MongoDB shell.

If you execute the following lines in the MongoDB shell, MongoDB will add the players collection to the previously created retrogames database and insert a new document with an automatically generated ObjectId value for its _id field (see Figure 9). You just need to replace "513a90ec507f318c7d15c744" with the ObjectId that MongoDB generates when you insert game1.

Because there is a player that has a score for the "Invaders 2013" game, the following command will update the value of the played field to true. The first parameter indicates that the search criteria that must be matched, and the second defines the values to be set using the $set operator. In this case, MongoDB will just update the first match, and it's OK because there is just one possible match when specifying a match in the primary key. You just need to replace "513a90ec507f318c7d15c744" with the ObjectId that MongoDB generated when you inserted game1.

When PUZZLEGAMESMASTER plays the same game again, it is necessary to add a new score to the scores array within the document for this player. Obviously, it isn't a good idea to retrieve the entire document and replace the contents of the scores array with a new array plus the additional element. As each player adds scores, that option would generate unnecessary traffic and huge read/write operations. So, the $set operator isn't the appropriate option because it would require you to replace the entire scores array.

You just want to add a new item to the scores array, and both the $push and $addToSet operators allow you to do that. The former adds an item to the specified array without checking duplicates, and the latter prevents a duplicate addition. In this case, a player might repeat the game, the date and the score, so the $push operator is the most convenient option.

The following lines add a new score document to the scores array for the PUZZLEGAMESMASTER player (see Figure 10). Again, you just need to replace "513a90ec507f318c7d15c744" with the ObjectId that MongoDB generated when you inserted game1 and "513b66a8880d01e7242a7e70" with the ObjectId for player1.

Figure 10: Adding a new score document to the scores array for the PUZZLEGAMESMASTER player and retrieving the updating player document.

The following line retrieves the document for the PUZZLEGAMESMASTER player by its ObjectId (see Figure 12). You just need to replace "513b66a8880d01e7242a7e70" with the ObjectId that MongoDB generated when you inserted player1.

db.players.findOne({ _id: ObjectId("513b66a8880d01e7242a7e70") })

The following lines show the new values for the scores array after a successful push update:

If you hate working with command-line tools and feel your productivity is low with the MongoDB shell, don't worry. In the upcoming articles, I will explain how to work with a GUI interface for MongoDB.

Obviously, because I've been talking about possibly huge amounts of data, it will be necessary to add the appropriate indexes. MongoDB is very flexible with schemas, but it still requires the appropriate indexes to avoid unnecessary full scans when querying data.

In this first article, I've provided a brief overview of some of the most important things that you must consider when you make the move from the relational databases island to MongoDB. In the next article, I'll explain how to perform additional operations in MongoDB and use a GUI tool to increase productivity. In addition, I'll explain how to start working with the C# driver to perform operations against the MongoDB collections in an application.

Gaston Hillar is an expert in Windows-based programming who writes frequently for Dr. Dobb's.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!