Wednesday, January 18, 2012

Getting Started with MongoDB and Python

If you've been following this blog for a while, you've seen me mention MongoDB
more than once. One exciting thing for me is that I'll be co-teaching a tutorial
at PyCon this year on Python and MongoDB that will cover
MongoDB, PyMongo, and Ming. So to hopefully whet your appetite for learning more at the tutorial, I
thought I'd write a few posts covering MongoDB, PyMongo, and Ming from a
beginner's perspective.

What is MongoDB?

Well, that's not all that enlightening, so I'll expand a bit here on MongoDB's features...

MongoDB is a document database

MongoDB is a document database, which means that instead of storing "rows" in
"tables" like you do in a relational database, you store "documents" in
"collections." Documents are basically JSON objects
(technically BSON. This is to be distinguished from other
NoSQL-type databases such as key-value stores (e.g. Tokyo
Cabinet), column family stores (e.g. Cassandra) or
column stores (e.g. MonetDB).

MongoDB has a flexible query language

This is one thing that makes MongoDB a pleasure to work with, particularly if you
come from another NoSQL database where querying is either restrictive (key-value
stores which can only be queried by key) or cumbersome (something like
CouchDB that requires you to write a map-reduce query). MongoDB
has a BSON-based query language that's a bit more restrictive than SQL, that you
can still use to get a lot done.

Here's an example of a simple MongoDB query that we use at SourceForge to find
all the blog posts for a project:

There are also several other operators like '$lt', '$nin', '$not', and '$or' that
allow you to construct quite complex queries, though you are somewhat restricted
from what you can do in SQL (even with a single table).

MongoDB is fast and scalable

A single MongoDB node is able to comfortably serve 1000s of requests per second
on cheap hardware. When you need to scale beyond that, you can use either
replication (keeping several copies of the data on different servers) or sharding
(partitioning the data across servers). MongoDB even includes logic to automatically
load-balance your shards as your database and load increase.

Getting Started with MongoDB

While MongoDB is fairly straightforward to install on (64-bit) systems, there are
also a couple of companies that provide a free tier of MongoDB hosting, MongoLab
and MongoHQ that are great for getting started. I've been using, for
no particular reason, MongoLab for my own things and I can recommend them, and
it's what I have experience with, so that's what I'll cover here.

Let's assume you sign up for a MongoLab account. Once you've done this, you can
create a database using their web-based control panel and click on it, you'll
note the connection info at the top of the page:

(Your server name and port number may be different.) At this point, most
tutorials would tell you to install and launch the 'mongo' command-line tool to
begin exploring your database. We'll skip that here and use the python driver
PyMongo directly. I like to use virtualenv myself and
ipython, so that's the approach I'll take here:

Well, that's it for now. I'll be posting several followup articles in this series
that will go into more detail on how to do various queries and updates using
PyMongo, the MongoDB python driver, as well as how to effectively use
Ming, so stay tuned!

Thanks for the comment. I'm sorry if the example was confusing. The example I used was looking for blog articles with an *exact match* with one of the app_config_ids passed in.

$in is always looking for an exact match in a list. If you want to find a partial match (such as a prefix), you need to use either $regex or a compiled python regular expression. For instance, if you're trying to find the articles starting with 'Hadoop', the query would be

articles.find({'title': {'$regex': '^Hadoop.*'}})

Again, sorry for the confusion. I'll try to be more explicit in the future. Thanks again for commenting.

Cool rundown, thanks Rick! In case anyone who is learning MongoDB finds it useful, I just launched a free tool called querymongo.com that translates MySQL syntax into MongoDB syntax. Hope someone can use it to get up to speed faster!

Search

Loading...

Useful Resources

Interested in practical MongoDB programming?

MongoDB Applied Design Patternsis available now, both in ebook and dead-tree form. In it, you'll see how to use MongoDB effectively in fields from real-time analytics to content management systems and more. The examples are all in Python, so readers of this blog should have no problem picking it all up.

Want to learn MongoDB using Python?

I just released an 84-page ebook MongoDB with Python and Ming to help you get started. In it, I cover everything from installing MongoDB for the first time, basic pymongo usage, MongoDB aggregation including MapReduce and the new aggregation framework, and GridFS. You'll also learn about Ming, the object-document mapper we built at SourceForge to accelerate our development beyond what we could do with PyMongo.

Pages

Rick's Resources

I'm collecting a list of products and services I've found useful in my work & leisure Python programming at Rick's Resources. If you're interested in that sort of thing, I'd love to have you check it out!