On the Mongo CLI, we can add this book object to our collection using the following command:

> db.books.insert(book)

Suppose we also add the shelf collection (for example, the floor, the row, the column the shelf is in, the book indexes it maintains, and so on that are part of the shelf object), which has the following structure:

Remember, it's quite possible that a few years down the line, some shelf instances may become obsolete and we might want to maintain their record. Maybe we could have another shelf instance containing only books that are to be recycled or donated. What can we do? We can approach this as follows:

The SQL way: Add additional columns to the table and ensure that there is a default value set in them. This adds a lot of redundancy to the data. This also reduces the performance a little and considerably increases the storage. Sad but true!

The NoSQL way: Add the additional fields whenever you want. The following are the MongoDB schemaless object model instances:

What just happened?

You will notice that the second object has more fields, namely comments and state. When fetching objects, it's fine if you get extra data. That is the beauty of NoSQL. When the first document is fetched (the one with the name Fiction), it will not contain the state and comments fields but the second document (the one with the name Romance) will have them. Are you worried what will happen if we try to access non-existing data from an object, for example, accessing comments from the first object fetched? This can be logically resolved—we can check the existence of a key, or default to a value in case it's not there, or ignore its absence. This is typically done anyway in code when we access objects. Notice that when the schema changed we did not have to add fields in every object with default values like we do when using a SQL database. So there is no redundant information in our database. This ensures that the storage is minimal and in turn the object information fetched will have concise data. So there was no redundancy and no compromise on storage or performance. But wait! There's more.

NoSQL scores over SQL databases

The way many-to-many relations are managed tells us how we can do more with MongoDB that just cannot be simply done in a relational database. The following is an example:

Each book can have reviews and votes given by customers. We should be able to see these reviews and votes and also maintain a list of top voted books.

If we had to do this in a relational database, this would be somewhat like the relationship diagram shown as follows: (get scared now!)

The vote_count and review_count fields are inside the books table that would need to be updated every time a user votes up/down a book or writes a review. So, to fetch a book along with its votes and reviews, we would need to fire three queries to fetch the information:

In MongoDB, we can do this directly using embedded documents or relational documents.

Using MongoDB embedded documents

Embedded documents, as the name suggests, are documents that are embedded in other documents. This is one of the features of MongoDB and this cannot be done in relational databases. Ever heard of a table embedded inside another table?

Instead of four tables and a complex many-to-many relationship, we can say that reviews and votes are part of a book. So, when we fetch a book, the reviews and the votes automatically come along with the book.

Embedded documents are analogous to chapters inside a book. Chapters cannot be read unless you open the book. Similarly embedded documents cannot be accessed unless you access the document.

For the UML savvy, embedded documents are similar to the contains or composition relationship.

Time for action – embedding reviews and votes

In MongoDB, the embedded object physically resides inside the parent. So if we had to maintain reviews and votes we could model the object as follows:

What just happened?

We now have reviews and votes inside the book. They cannot exist on their own. Did you notice that they look similar to JSON hashes and arrays? Indeed, they are an array of hashes. Embedded documents are just like hashes inside another object.

There is a subtle difference between hashes and embedded objects as we shall see later on in the book.

Have a go hero – adding more embedded objects to the book

Try to add more embedded objects such as orders inside the book document. It works!

order = {
name: "Toby Jones"
type: "lease",
units: 1,
cost: 40
}

Fetching embedded objects

We can fetch a book along with the reviews and the votes with it. This can be done by executing the following command:

This does indeed look simple, doesn't it? By fetching a single object, we are able to get the review and vote count along with the data.

Use embedded documents only if you really have to! Embedded documents increase the size of the object. So, if we have a large number of embedded documents, it could adversely impact performance. Even to get the name of the book, the reviews and the votes are fetched.

Using MongoDB document relationships

Just like we have embedded documents, we can also set up relationships between different documents.

Time for action – creating document relations

The following is another way to create the same relationship between books, users, reviews, and votes. This is more like the SQL way.

What just happened?

Hmm!! Not very interesting, is it? It doesn't even seem right. That's because it isn't the right choice in this context. It's very important to know how to choose between nesting documents and relating them.

In your object model, if you will never search by the nested document (that is, look up for the parent from the child), embed it.

Just in case you are not sure about whether you would need to search by an embedded document, don't worry too much – it does not mean that you cannot search among embedded objects. You can use Map/Reduce to gather the information.

Comparing MongoDB versus SQL syntax

This is a good time to sit back and evaluate the similarities and dissimilarities between the MongoDB syntax and the SQL syntax. Let's map them together:

Some more notable comparisons between MongoDB and relational databases are:

MongoDB does not support joins. Instead it fires multiple queries or uses Map/Reduce. We shall soon see why the NoSQL faction does not favor joins.

SQL has stored procedures. MongoDB supports JavaScript functions.

MongoDB has indexes similar to SQL.

MongoDB also supports Map/Reduce functionality.

MongoDB supports atomic updates like SQL databases.

Embedded or related objects are used sometimes instead of a SQL join.

MongoDB collections are analogous to SQL tables.

MongoDB documents are analogous to SQL rows.

Using Map/Reduce instead of join

We have seen this mentioned a few times earlier—it's worth jumping into it, at least briefly.

Map/Reduce is a concept that was introduced by Google in 2004. It's a way of distributed task processing. We "map" tasks to works and then "reduce" the results.

Understanding functional programming

Functional programming is a programming paradigm that has its roots from lambda calculus. If that sounds intimidating, remember that JavaScript could be considered a functional language. The following is a snippet of functional programming:

We can have functions inside functions. Higher-level languages (such as Java and Ruby) support anonymous functions and closures but are still procedural functions. Functional programs rely on results of a function being chained to other functions.

Building the map function

The map function processes a chunk of data. Data that is fed to this function could be accessed across a distributed filesystem, multiple databases, the Internet, or even any mathematical computation series!

function map(void) -> void

The map function "emits" information that is collected by the "mystical super gigantic computer program" and feeds that to the reducer functions as input. MongoDB as a database supports this paradigm making it "the all powerful" (of course I am joking, but it does indeed make MongoDB very powerful).

Time for action – writing the map function for calculating vote statistics

What just happened?

The emit function emits the data. Notice that the data is emitted as a (key, value) structure.

Key: This is the parameter over which we want to gather information. Typically it would be some primary key, or some key that helps identify the information.

For the SQL savvy, typically the key is the field we use in the GROUP BY clause.

Value: This is a JSON object. This can have multiple values and this is the data that is processed by the reduce function.

We can call emit more than once in the map function. This would mean we are processing data multiple times for the same object.

Building the reduce function

The reduce functions are the consumer functions that process the information emitted from the map functions and emit the results to be aggregated. For each emitted data from the map function, a reduce function emits the result. MongoDB collects and collates the results. This makes the system of collection and processing as a massive parallel processing system giving the all mighty power to MongoDB.

The reduce functions have the following signature:

function reduce(key, values_array) -> value

Time for action – writing the reduce function to process emitted information

The variable result has a structure similar to what was emitted from the map function. This is important, as we want the results from every document in the same format. If we need to process more results, we can use the finalize function (more on that later). The result function has the following structure:

The values are always passed as arrays. It's important that we iterate the array, as there could be multiple values emitted from different map functions with the same key. So, we processed the array to ensure that we don't overwrite the results and collate them.

Understanding the Ruby perspective

Until now we have just been playing around with MongoDB. Now let's have a look at this from Ruby. Aaahhh… bliss!

For this example, we shall write some basic classes in Ruby. We are using Rails 3 and the Mongoid wrapper for MongoDB. (We shall see more about MongoDB wrappers later in the book)

Setting up Rails and MongoDB

To set up a Rails project, we first need to install the Rails gem. We shall also install the Bundler gem that goes hand-in-hand with Rails.

Time for action – creating the project

First we shall create the sample Rails project. Assuming you have installed Ruby already, we need to install Rails. The following command shows how to install Rails and Bundler.

$ gem install rails
$ gem install bundler

What just happened?

The preceding commands will install Rails and Bundler. For the sake of this example, I am working with Rails 3.2.0 (that is, the current latest version) but I recommend that you should use the latest version of Rails available.

Understanding the Rails basics

Rails is a web framework written in Ruby. It was released publicly in 2005 and it has gathered a lot of steam since then. It is interesting to note that until Rails 2.x, the framework was a tightly coupled one. This was when other loosely coupled web frameworks made their way into the developer market. The most popular among them were Merb and Sinatra. These frameworks leveraged Ruby to its full potential but were competing against each other.

Around 2008-2009, the Rails core team (David Hanson and team)
met the makers of Merb (Yehuda Katz and team) and they got
together and discussed a strategy that has literally changed the
face of web development. Rails 3 emerged with a bang; it had a
brand new framework with Metal and Rack with loosely coupled
components and very customizable middleware. This has made
Rails extremely popular today.

Using Bundler

Bundler is another awesome gem by "Carlhuda" (Yahuda and Carl Leche) that manages gem dependencies in Ruby applications.

Why do we need the Bundler

In the "olden" days, when everything was a system installation, things would be running smoothly till somebody upgraded a system library or a gem... and then Kaboom! — the application crashed for no apparent reason and no code change. Some libraries break compatibility, which in turn requires us to install the new gems. So, even if a system administrator upgraded the system (as a routine maintenance activity), our Ruby application was prone to crashes.

A bigger problem arose when we were required to install multiple Ruby applications on the same system. Ruby version, Rails version, gem versions, and system libraries all could potentially clash to make development and deployment a nightmare!

One solution was to freeze gems and the Ruby version. This required us to ship everything into our application bundle. Not only was this inefficient but also increased the size of the bundle.

Then came along Bundler and, as the name suggests, it keeps track of dependencies in a Ruby application. Java has a similar package called Maven. But wait! Bundler has more in store. We can now package gems (via a Gemfile) and specify environments with it. So, if we
require some gems only for testing, it can be specified to be a part of only the "test" group.

If that's not sold you over using Bundler, we can specify the source of the gem files too – github, sourceforge or even a gem in our local file system.

Bundler generates Gemfile.lock that manages the gem dependencies for the application. It uses the system-installed gems; so that we don't have to freeze gems or Ruby versions with each application.

Setting up Sodibee

Now that we have installed Rails and Bundler, it's time to set up the Sodibee project.

Time for action – start your engines

Now we shall create the Sodibee project in Rails 3. It can be done using the following command:

$ rails new sodibee –JO

In the previous command, -J means skip-prototype (and use jQuery instead) and -O means skip-activerecord. This is important, as we want to use MongoDB.

Add the following to Gemfile:

gem 'mongoid'
gem 'bson'
gem 'bson_ext'

Now on command line, type the following:

$ bundle install

In Rails 3.2.1 a lot of automaton has been added. bundle install is part of the process of creating a project.

What just happened?

The previous command: bundle install fetches missing gems, their dependencies, and installs them. It then generates Gemfile.lock. After bundle install is complete, you would see the following on the screen:

$ bundle install
Fetching source index for http://rubygems.org/
Using rake (0.9.2)
Using abstract (1.0.0)
Using activesupport (3.2.0)
Using builder (2.1.2)
Using i18n (0.5.0)
Using activemodel (3.2.0)
Using erubis (2.6.6)
Using rack (1.2.4)
Using rack-mount (0.6.14)
Using rack-test (0.5.7)
Installing tzinfo (0.3.30)
Using actionpack (3.2.0)
Using mime-types (1.16)
Using polyglot (0.3.2)
Using treetop (1.4.10)
Using mail (2.2.19)
Using actionmailer (3.2.0)
Using arel (2.0.10)
Using activerecord (3.2.0)
Using activeresource (3.2.0)
Using bson (1.4.0)
Using bundler (1.0.10)
Using mongo (1.3.1)
Installing mongoid (2.2.1)
Using rdoc (3.9.4)
Using thor (0.14.6)
Using railties (3.2.0)
Using rails (3.2.0)
Your bundle is complete! Use `bundle show [gemname]` to see where a
bundled gem is installed.

Setting up Mongoid

Now that the Rails application is set up, let's configure Mongoid. Mongoid is an Object Document Mapper (ODM) tool that maps Ruby objects to MongoDB documents. For now, we shall simply issue the command to configure Mongoid.

Time for action – configuring Mongoid

The Mongoid gem has a Rails generator command to configure Mongoid.

A Rails generator, as the name suggests, sets up files. Generators are used frequently in gems to set up config files, with default settings, g can be used instead of writing generate.

$ rails g mongoid:config

What just happened?

This command created a config/mongoid.yml file that is used to connect to MongoDB. The file would look like the following code snippet:

Notice that there are now three environments to work with—development, test, and production. By default, Rails will pick up the development environment. We do not need to explicitly create the database in MongoDB. The first call to the database will create the database for us.

The previous command also configures the config/application.rb to ensure that ActiveRecord is disabled. ActiveRecord is the default Rails ORM (Object Relational Mapper). As we are using Mongoid, we need to disable ActiveRecord.

Building the models

Now that we have the project set up, it's time we create the models. Each model will autocreate collections in MongoDB. To create a model, all we need to do is create a file in the app/models folder.

The preceding code includes the Mongoid module to save the documents in MongoDB

include is the Ruby way of adding methods to the Ruby class by including modules. This is called module mixin. We can include as
many modules in a class as we want. Modules make the class richer by adding all the module methods as instance methods.extend is the Ruby way of adding class methods to a Ruby class by including modules in it. All the methods from the modules included become class methods.

The previous code configures the name and the type of the fields for a document.

Notice the Ruby 1.9 syntax for a hash. No more hash rockets (=>). Instead in we use the JSON notation directly. Remember it's type:String and not type : String. You must have the key and the colon (:) together.

It's very important that the inverse relation that is, the embedded_in is mentioned in reviews. This tells Mongoid how to store the embedded object. If this is not written, objects will be not get embedded.

Testing from the Rails console

Nothing is ever complete without testing. The Rails community is almost fanatical about integrating tests into the project. We shall learn about testing soon, but for now let's test our code from the Rails console.

Time for action – putting it all together

Now we shall test these models to see if they indeed work as expected. We shall create different objects and their relations. The fun begins! Let's start the Rails console and create our first book object:

$ rails console

The Rails console is a command-line interactive command prompt that loads the Rails environment and the models. It's the best way to check and test if our data models are correct.

Procrastinate and Laziness Personified by Toby D Cided in the Self-help category

Understanding many-to-many relationships in MongoDB

In a SQL database, a many-to-many relationship is done using an intermediate table. For example, the many-to many relationship we have mentioned previously between books and categories, would be achieved in the following manner in a SQL database:

As MongoDB is a schemaless database, we do not need any additional temporary collections. The following is what the book object stores:

Notice that the reviews are embedded inside the book object. Now when we fetch the book object, we will automatically get all the reviews too.

Choosing whether to embed or not to embed

Suppose we want to prepare orders for a book. The book can be leased or purchased. If we want to maintain an order history in terms of lease and purchase, how do we build the Lease, Purchase, and Order models?

Working with Map/Reduce

To see an example of how Map/Reduce works, let's now add votes to books. The following shows how we can add votes:

{
"username" : "Dick",
"rating" : 5
}

Rating could be on a scale of 1 to 10, with 10 being the best. Every user can rate a book. Our aim is to collect the total rating by all users. We shall save this information as a hash in
the votes array in the book object. This should not be confused with an embedded object (as it does not have an object ID).

We have not seen the MongoDB data types such as ObjectId and ISODate. All usual data types such as integer, float, string, hash, and array are supported.

The following is how we save this information as a hash in the votes array in the book object:

Note that we first set b.votes = [] ,that is, an empty array. This is because MongoDB does not add the fields to the database until they are populated. So, by default b.votes would return nil. Hence it's important to initialize it the first time.

Now, for Great Expectations (for example, three votes, one each by Gautam, Tom, and Dick)

If we want to collect all the votes and add up the rating for each user, it can be a pretty cumbersome task to iterate over all of these objects. This is the where Map/Reduce helps us.

One alternative to Map/Reduce in this particular example would be to capture the vote count per book by incrementing a counter while inserting votes and reviews itself. However, we shall use Map/Reduce here so that we understand how it works.

Time for action – writing the map function to calculate ratings

This is how we can write the map function. As we have seen earlier, this function will emit information, in our case, the key is the username and the value is the rating:

What just happened?

This is a JavaScript function. MongoDB understands and processes all JS functions. Every time emit() is called, some data is emitted for the reduce function to process. In the preceding
code this represents the collection object.

What we want to do is emit all the ratings for each element in the votes array for every book. The emit() takes the key and value as parameters. So, we are emitting the users votes for the reduce function to process. It's also important to remember the data structure we are emitting as the value. It should be consistent for all objects. In our case {rating: x.rating}.

Time for action – writing the reduce function to process the emitted results

Now let's write the reduce function. This takes a key and an array of values, shown as follows:

What just happened?

The reduce function is the one which processes the values that were emitted from the map function.

Remember that the values parameter is always an array. The map function could emit results for the same key multiple times, so we should be sure to process the value as an array and accumulate results. The return structure should be the same as what was emitted.

MongoDB supports Map/Reduce and will invoke Map/Reduce
functions in parallel. This gives it power over standard SQL databases.
The closest a SQL database comes to this is when we use a GROUP
BY query. It depends on the indexes and the query fired that can get
us similar results like Map/Reduce.

Using Map/Reduce together

As MongoDB requires JavaScript functions, the trick here is to pass the JavaScript functions to the MongoDB engine via a string on the Rails console. So, we create two strings for the map and reduce functions.

What just happened?

Voila! This shows that we have the following result:

Dick has 12 ratings

Gautam has 21 ratings

Tom has 3 ratings

Tally these ratings manually with the preceding code and verify.

What would you have to do if you did not have Map/Reduce? Iterate over all book objects and collect the votes array. Then keep a temporary hash of usernames and keep aggregating the ratings. Lots of work indeed!

Don't always jump into using Map/Reduce. Sometimes it's just easier to query properly. Suppose, we want to find all the books that have votes or reviews for them, what do we do?

Do we iterate every book object and check the length of the votes array or the reviews array?

Do we run Map/Reduce for this?

? Is there a direct query for this?

We can directly fire a query from the Rails console, as follows:

irb> Book.any_of({:reviews.exists => true}, {:votes.exists => true})

If we want to search directly on the mongo console, we have to execute the following command:

Remember, we should use Map/Reduce only when we have to process data and return results (for example, when it's mostly statistical data). For most cases, there would be a query (or multiple queries) that would get us our results.

Summary

Here we really jumped into Ruby and MongoDB, didn't we? We saw how to create objects in MongoDB directly and then via Ruby using Mongoid. We saw how to set up a Rails project, configure Mongoid, and build models. We even went the distance to see how Map/Reduce would work in MongoDB.

We saw a lot of new things too, which require explanation. For example, the various data
types that are supported in MongoDB, such as ObjectId, ISODate.

Alerts & Offers

Series & Level

We understand your time is important. Uniquely amongst the major publishers, we seek to develop and publish the broadest range of learning and information products on each technology. Every Packt product delivers a specific learning pathway, broadly defined by the Series type. This structured approach enables you to select the pathway which best suits your knowledge level, learning style and task objectives.

Learning

As a new user, these step-by-step tutorial guides will give you all the practical skills necessary to become competent and efficient.

Beginner's Guide

Friendly, informal tutorials that provide a practical introduction using examples, activities, and challenges.

Essentials

Fast paced, concentrated introductions showing the quickest way to put the tool to work in the real world.

Cookbook

A collection of practical self-contained recipes that all users of the technology will find useful for building more powerful and reliable systems.

Blueprints

Guides you through the most common types of project you'll encounter, giving you end-to-end guidance on how to build your specific solution quickly and reliably.

Mastering

Take your skills to the next level with advanced tutorials that will give you confidence to master the tool's most powerful features.

Starting

Accessible to readers adopting the topic, these titles get you into the tool or technology so that you can become an effective user.

Progressing

Building on core skills you already have, these titles share solutions and expertise so you become a highly productive power user.