Working With MongoDB's Aggregation Pipelines in Ruby

11 Apr 2015

If you needed to create a report
with details of how many users signed up for your app each month,
it would be trivial to do using SQL.
All you need to do is write a group_by query
and voila, there’s your report.

I’ve always missed SQL’s aggregation features in MongoDB.
The standard way to achieve this was to use map-reduce jobs.
When you do this in Ruby,
it ends up being very inelegant,
with strings containing JavaScript code
sprinkled inside the Ruby.

Recently I found out that MongoDB ships with
aggregation pipelines
since version 2.2.
This makes it much simpler to do such queries
without using map-reduce.

Testing in the Mongo shell

Before writing any Ruby,
let’s play around with the feature in Mongo shell.
For the sake of simplicity,
let’s only consider the users collection to contain 3 fields -
id, email and a created_at date.
A single document from the collection might look like this:

Here stage1 and stage2 are stages of the pipeline
that transform the data to create the final result set.
Each stage performs some transformation on the collection
and passes the resulting documents to the next stage.

Let’s look at how we can build up the pipelines.
The pipelines are simple hashes
that describe the operations to be performed
on the data.

First of all, we need to
extract the month and year from the signup date.
For this, we can use a transformation called $project.
We’ll put this in a variable called project
that we can later use in place of stage1 above.

This transforms each document into the form shown below.
The year and month of the signup date
are extracted into the signup_year and signup_month fields.
This makes it easy for us to group the documents
by year and month in the next stage.

The next stage of the pipeline is the $group stage.
This is similar to group by in SQL.
The below code for the group stage tells MongoDB
to group the data by month and year.
The signups: { '$sum': 1 } line
adds 1 for each occurence of the year/month combination.

Ruby and mongoid

Now let’s try doing the same in Ruby
using the mongoid gem.
Assuming that we have a User model
that talks to MongoDB,
we can write:

User.collection.aggregate([project,group])

For the project and group variables,
you just need to copy the code we typed into the mongo shell.
Since Ruby’s syntax for hashes is so similar to JavaScript’s,
we don’t even have to change anything there.

In this way we could easily apply
more transformations on the collection.
For instance, if we needed to sort the results,
we could do this:

Even though I’ve used Mongoid gem in this example,
it’s not actually necessary to use this feature.
We could actually accomplish this with the moped gem,
which is the driver mongoid uses to talk to the database.
Using moped we would do:

Here, session[:users] returns the same object as User.collection
from the previous example.

Letting MongoDB take care of this responsibility
means that our Ruby programs have to deal with
less data and fewer object allocations.

If you find yourself doing such operations
by fetching the entire collection
and then transforming it in Ruby,
you will find aggregation pipelines a very useful tool.
Take a look at
the list of available stage operators
that you can use.

Links

Hi, I’m Nithin Bekal.
I work at Shopify in Ottawa, Canada.
Previously, co-founder of
CrowdStudio.in and
WowMakers.
Ruby is my preferred programming language,
and the topic of most of my articles here,
but I'm also a big fan of Elixir.
Tweet to me at @nithinbekal.