A collection is a
group of documents stored in MongoDB, and can be thought of as roughly
the equivalent of a table in a relational database. Getting a
collection in PyMongo works the same as getting a database:

>>> collection=db.test_collection

or (using dictionary style access):

>>> collection=db['test-collection']

An important note about collections (and databases) in MongoDB is that
they are created lazily - none of the above commands have actually
performed any operations on the MongoDB server. Collections and
databases are created when the first document is inserted into them.

Data in MongoDB is represented (and stored) using JSON-style
documents. In PyMongo we use dictionaries to represent documents. As
an example, the following dictionary might be used to represent a blog
post:

To insert a document into a collection we can use the
insert() method:

>>> posts=db.posts>>> posts.insert(post)ObjectId('...')

When a document is inserted a special key, "_id", is automatically
added if the document doesn’t already contain an "_id" key. The value
of "_id" must be unique across the
collection. insert() returns the
value of "_id" for the inserted document. For more information, see the
documentation on _id.

After inserting the first document, the posts collection has
actually been created on the server. We can verify this by listing all
of the collections in our database:

>>> db.collection_names()[u'posts', u'system.indexes']

Note

The system.indexes collection is a special internal
collection that was created automatically.

The most basic type of query that can be performed in MongoDB is
find_one(). This method returns a
single document matching a query (or None if there are no
matches). It is useful when you know there is only one matching
document, or are only interested in the first match. Here we use
find_one() to get the first
document from the posts collection:

You probably noticed that the regular Python strings we stored earlier look
different when retrieved from the server (e.g. u’Mike’ instead of ‘Mike’).
A short explanation is in order.

MongoDB stores data in BSON format. BSON strings are
UTF-8 encoded so PyMongo must ensure that any strings it stores contain only
valid UTF-8 data. Regular strings (<type ‘str’>) are validated and stored
unaltered. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The
reason our example string is represented in the Python shell as u’Mike’ instead
of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string,
not a regular str.

In order to make querying a little more interesting, let’s insert a
few more documents. In addition to inserting a single document, we can
also perform bulk insert operations, by passing an iterable as the
first argument to insert(). This
will insert each document in the iterable, sending only a single
command to the server:

To get more than a single document as the result of a query we use the
find()
method. find() returns a
Cursor instance, which allows us to iterate
over all matching documents. For example, we can iterate over every
document in the posts collection:

To make the above query fast we can add a compound index on
"date" and "author". To start, lets use the
explain() method to get some information
about how the query is being performed without the index: