Binary Storage with MongoDB and GridFS

In a previous blog post, I discussed various storage options for the Nuxeo Platform and specifically support for MongoDB and appropriate use cases. Today, let's talk about GridFS for binary storage and learn more about the flexibility of the Nuxeo Platform to support various storage and query subsystems to achieve the ideal configuration for your specific use cases.

Let’s take a look at the Nuxeo storage architecture again, in this case configured with MongoDB and GridFS.You can see that the Nuxeo Platform uses different subsystems for the Document (metadata) and Blob (binaries). A Document is a set of metadata values, various attributes (Facets) as well as referenced binaries, combined in the output from the Repository to the application. A binary in Nuxeo is wrapped by a Document. This allows us to use the most appropriate storage for metadata as well as binaries. Nuxeo Document Store and Blob stores are pluggable, allowing the choice of using these abstractions.

Benefits of Using GridFS

MongoDB users will certainly be familiar with GridFS. By providing a Blob Store implemented by GridFS for binary storage to replace the File System, we are sensibly integrating Document and Blob storage into the MongoDB container.

There are two areas of benefits here:

First, the GridFS capabilities themselves are great. Instead of directly dealing with the File System, GridFS uses the functionality of MongoDB to provide replication, distribution and possible redundancy capabilities. This is possible by breaking every binary into chucks of configurable size, and storing them as individual records in a collection designated for them, while another collection stores records for each file’s information. The GridFS API is built on top of the MongoDB system and handles all the chunking, rebuilding of files and read/write access. Since the files are broken into chunks, you can access the chunks you need for read or write without streaming the whole file. You can also download and stream multiple pieces simultaneously to potentially increase throughput.

The second benefit of combining MongoDB with GridFS is the common management and administration tools and methods. It’s no longer needed to have different backup and infrastructure strategies for database and File System, so it’s effiecient from an administration perspective.

Configuring the Nuxeo Platform to use GridFS

Configuring the Nuxeo Platform to use GridFS instead of the File System is an easy and transparent process. Since the Nuxeo Platform is extension based, it’s just a matter of deploying an XML based configuration to specify the Blob Manager implementation and a couple of properties.

Now when accessing a binary, the system will simply call the same read/write methods on the new implementation. Migration would also be an easy process. The Nuxeo Platform stores binaries in a simple bucket like structure where the only reference is a unique MD5 Checksum based filename. So, if you have binaries already existing in the Nuxeo Platform on the File System, you can simply copy and dump into GridFS and switch the Blob Manager. This is transparent to the application because only the ID is passed to the Blob Manager.