PyMongo is not fork-safe. Care must be taken when using instances of
MongoClient with fork(). Specifically,
instances of MongoClient must not be copied from a parent process to
a child process. Instead, the parent process and each child process must
create their own instances of MongoClient. Instances of MongoClient copied from
the parent process have a high probability of deadlock in the child process due
to the inherent incompatibilities between fork(), threads, and locks
described below. PyMongo will attempt to
issue a warning if there is a chance of this deadlock occurring.

MongoClient spawns multiple threads to run background tasks such as monitoring
connected servers. These threads share state that is protected by instances of
Lock, which are themselves not fork-safe. The
driver is therefore subject to the same limitations as any other multithreaded
code that uses Lock (and mutexes in general). One of these
limitations is that the locks become useless after fork(). During the fork,
all locks are copied over to the child process in the same state as they were
in the parent: if they were locked, the copied locks are also locked. The child
created by fork() only has one thread, so any locks that were taken out by
other threads in the parent will never be released in the child. The next time
the child process attempts to acquire one of these locks, deadlock occurs.

Every MongoClient instance has a built-in
connection pool per server in your MongoDB topology. These pools open sockets
on demand to support the number of concurrent MongoDB operations that your
multi-threaded application requires. There is no thread-affinity for sockets.

The size of each connection pool is capped at maxPoolSize, which defaults
to 100. If there are maxPoolSize connections to a server and all are in
use, the next request to that server will wait until one of the connections
becomes available.

The client instance opens one additional socket per server in your MongoDB
topology for monitoring the server’s state.

For example, a client connected to a 3-node replica set opens 3 monitoring
sockets. It also opens as many sockets as needed to support a multi-threaded
application’s concurrent operations on each server, up to maxPoolSize. With
a maxPoolSize of 100, if the application only uses the primary (the
default), then only the primary connection pool grows and the total connections
is at most 103. If the application uses a
ReadPreference to query the secondaries,
their pools also grow and the total connections can reach 303.

It is possible to set the minimum number of concurrent connections to each
server with minPoolSize, which defaults to 0. The connection pool will be
initialized with this number of sockets. If sockets are closed due to any
network errors, causing the total number of sockets (both in use and idle) to
drop below the minimum, more sockets are opened until the minimum is reached.

The maximum number of milliseconds that a connection can remain idle in the
pool before being removed and replaced can be set with maxIdleTime, which
defaults to None (no limit).

The default configuration for a MongoClient
works for most applications:

client=MongoClient(host,port)

Create this client once for each process, and reuse it for all
operations. It is a common mistake to create a new client for each request,
which is very inefficient.

To support extremely high numbers of concurrent MongoDB operations within one
process, increase maxPoolSize:

client=MongoClient(host,port,maxPoolSize=200)

… or make it unbounded:

client=MongoClient(host,port,maxPoolSize=None)

By default, any number of threads are allowed to wait for sockets to become
available, and they can wait any length of time. Override waitQueueMultiple
to cap the number of waiting threads. E.g., to keep the number of waiters less
than or equal to 500:

client=MongoClient(host,port,maxPoolSize=50,waitQueueMultiple=10)

When 500 threads are waiting for a socket, the 501st that needs a socket
raises ExceededMaxWaiters. Use this option to
bound the amount of queueing in your application during a load spike, at the
cost of additional exceptions.

Once the pool reaches its max size, additional threads are allowed to wait
indefinitely for sockets to become available, unless you set
waitQueueTimeoutMS:

client=MongoClient(host,port,waitQueueTimeoutMS=100)

A thread that waits more than 100ms (in this example) for a socket raises
ConnectionFailure. Use this option if it is more
important to bound the duration of operations during a load spike than it is to
complete every operation.

When close() is called by any thread,
all idle sockets are closed, and all sockets that are in use will be closed as
they are returned to the pool.

The key-value pairs in a BSON document can have any order (except that _id
is always first). The mongo shell preserves key order when reading and writing
data. Observe that “b” comes before “a” when we create the document and when it
is displayed:

PyMongo represents BSON documents as Python dicts by default, and the order
of keys in dicts is not defined. That is, a dict declared with the “a” key
first is the same, to Python, as one with “b” first:

The subdocument’s actual storage layout is now visible: “b” is before “a”.

Because a dict’s key order is not defined, you cannot predict how it will be
serialized to BSON. But MongoDB considers subdocuments equal only if their
keys have the same order. So if you use a dict to query on a subdocument it may
not match:

>>> collection.find_one({'subdocument':{'a':1.0,'b':1.0}})isNoneTrue

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument':{'b':1.0,'a':1.0}})isNoneTrue

… because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

The query matches any subdocument with an “a” of 1.0 and a “b” of 1.0,
regardless of the order you specify them in Python or the order they are stored
in BSON. Additionally, this query now matches subdocuments with additional
keys besides “a” and “b”, whereas the previous query required an exact match.

Cursors in MongoDB can timeout on the server if they’ve been open for
a long time without any operations being performed on them. This can
lead to an CursorNotFound exception being
raised when attempting to iterate the cursor.

MongoDB <= 3.2 only supports IEEE 754 floating points - the same as the
Python float type. The only way PyMongo could store Decimal instances to
these versions of MongoDB would be to convert them to this standard, so
you’d really only be storing floats anyway - we force users to do this
conversion explicitly so that they are aware that it is happening.

The database representation is 9.99 as an IEEE floating point (which
is common to MongoDB and Python as well as most other modern
languages). The problem is that 9.99 cannot be represented exactly
with a double precision floating point - this is true in some versions of
Python as well:

>>> 9.999.9900000000000002

The result that you get when you save 9.99 with PyMongo is exactly the
same as the result you’d get saving it with the JavaScript shell or
any of the other languages (and as the data you’re working with when
you type 9.99 into a Python program).

This request has come up a number of times but we’ve decided not to
implement anything like this. The relevant jira case has some information
about the decision, but here is a brief summary:

This will pollute the attribute namespace for documents, so could
lead to subtle bugs / confusing errors when using a key with the
same name as a dictionary method.

The only reason we even use SON objects instead of regular
dictionaries is to maintain key ordering, since the server
requires this for certain operations. So we’re hesitant to
needlessly complicate SON (at some point it’s hypothetically
possible we might want to revert back to using dictionaries alone,
without breaking backwards compatibility for everyone).

It’s easy (and Pythonic) for new users to deal with documents,
since they behave just like dictionaries. If we start changing
their behavior it adds a barrier to entry for new users - another
class to learn.

PyMongo doesn’t support saving datetime.date instances, since
there is no BSON type for dates without times. Rather than having the
driver enforce a convention for converting datetime.date
instances to datetime.datetime instances for you, any
conversion should be performed in your client code.

It’s common in web applications to encode documents’ ObjectIds in URLs, like:

"/posts/50b3bda58a02fb9a84d8991e"

Your web framework will pass the ObjectId portion of the URL to your request
handler as a string, so it must be converted to ObjectId
before it is passed to find_one(). It is a
common mistake to forget to do this conversion. Here’s how to do it correctly
in Flask (other web frameworks are similar):

frompymongoimportMongoClientfrombson.objectidimportObjectIdfromflaskimportFlask,render_templateclient=MongoClient()app=Flask(__name__)@app.route("/posts/<_id>")defshow_post(_id):# NOTE!: converting _id from string to ObjectId before passing to find_onepost=client.db.posts.find_one({'_id':ObjectId(_id)})returnrender_template('post.html',post=post)if__name__=="__main__":app.run()

Django is a popular Python web
framework. Django includes an ORM, django.db. Currently,
there’s no official MongoDB backend for Django.

django-mongodb-engine
is an unofficial MongoDB backend that supports Django aggregations, (atomic)
updates, embedded objects, Map/Reduce and GridFS. It allows you to use most
of Django’s built-in features, including the ORM, admin, authentication, site
and session frameworks and caching.

However, it’s easy to use MongoDB (and PyMongo) from Django
without using a Django backend. Certain features of Django that require
django.db (admin, authentication and sessions) will not work
using just MongoDB, but most of what Django provides can still be
used.

One project which should make working with MongoDB and Django easier
is mango. Mango is a set of
MongoDB backends for Django sessions and authentication (bypassing
django.db entirely).

json_util is PyMongo’s built in, flexible tool for using
Python’s json module with BSON documents and MongoDB Extended JSON. The
json module won’t work out of the box with all documents from PyMongo
as PyMongo supports some special types (like ObjectId
and DBRef) that are not supported in JSON.

python-bsonjs is a fast
BSON to MongoDB Extended JSON converter built on top of
libbson. python-bsonjs does not
depend on PyMongo and can offer a nice performance improvement over
json_util. python-bsonjs works best with PyMongo when using
RawBSONDocument.

PyMongo decodes BSON datetime values to instances of Python’s
datetime.datetime. Instances of datetime.datetime are
limited to years between datetime.MINYEAR (usually 1) and
datetime.MAXYEAR (usually 9999). Some MongoDB drivers (e.g. the PHP
driver) can store BSON datetimes with year values far outside those supported
by datetime.datetime.

There are a few ways to work around this issue. One option is to filter
out documents with values outside of the range supported by
datetime.datetime:

On Unix systems the multiprocessing module spawns processes using fork().
Care must be taken when using instances of
MongoClient with fork(). Specifically,
instances of MongoClient must not be copied from a parent process to a child
process. Instead, the parent process and each child process must create their
own instances of MongoClient. For example:

# Each process creates its own instance of MongoClient.deffunc():db=pymongo.MongoClient().mydb# Do something with db.proc=multiprocessing.Process(target=func)proc.start()

Never do this:

client=pymongo.MongoClient()# Each child process attempts to copy a global MongoClient# created in the parent process. Never do this.deffunc():db=client.mydb# Do something with db.proc=multiprocessing.Process(target=func)proc.start()