MongoDB Goes Pluggable with Storage Engines

Alex Woodie

MongoDB is taking a page out of the MySQL playbook and adopting a new architecture that allows customers to run a variety of different storage engines that are optimized to do specific tasks on its database. The company is also addressing database management concerns with a new OpsManager console in MongoDB 3.0.

One of the features that made MySQL so popular a decade ago was its pluggable storage architecture. Depending on the different needs that a customer had–such as processing e-commerce transactions, powering a data warehouse, or running a mobile app–they could plug in different database engine that was optimized to do that specific job. It worked well and garnered MySQL favor with independent developers and third-party vendors–at least until it was eventually consumed by Oracle.

Now MongoDB is now taking that same approach with its popular document-based NoSQL database. When MongoDB 3.0 ships next month, users will have three storage options to choose from, including:

MMAPV1, an upgraded version of the standard MongoDB storage engine;

Wired Tiger, a write-optimized storage engine that will increase the performance write-intensive applications by an average of seven to 10 times;

an in-memory storage engine that will provide a 100,000x boost in read and write performance (considered experimental and not fully supported).

The new Wired Tiger storage engine, which MongoDB acquired in December, will to have the biggest immediate impact on MongoDB customers. With Wired Tiger, MongoDB is able to provide concurrency control at the invidiaul document level, as opposed to the database level. That means the database can process many more operations at the same time than with the old storage engine, Stirman says.

Wired Tiger, which was founded by the founders of BerkeleyDB, also introduces compression algorithms to MongoDB, which had no native compression facilities prior. The company is predicting the average customer will see anywhere from a 50 to a 70 percent reduction in storage volumes.

Adopting the Wired Tiger or in-memory storage engines will have no impact on developers and will require no work to upgrade from MongoDB 2.6, says Kelly Stirman, director of product for MongoDB.

Over time, additional storage engines will be added by MongoDB, the open source community, and third-party vendors, Stirman says. While there are a lot of possibilities, a couple use cases stand out.

“Companies like Fusion IO have really high performance storage hardware and they offer APIs to those products that bypass the file system,” Stirman says. “The way that integration would work through MongoDB is through a storage engine. You could imagine in the not too distant future is a Fusion IO storage engine that lets you run MongoDB in a highly optimized way on Fusion IO hardware.” Other storage companies, such as EMC, Violin, and NetApp, also offer proprietary APIs to bypass file systems that could also be the makings for a MongoDB storage engine.

A storage engine for Hadoop could also be in the offing. “We could certainly have a storage engine that runs on HDFS, so that instead of bulk copying data from MongoDB to Hadoop, that the data would seamlessly trickle into HDFS using MongDB’s replication,” Stirman says. “That’s a great option for some applications. For others our existing Hadoop connector is a better option.”

Clearly, MongoDB is looking to expand its influence beyond just running transactional workloads on a document-oriented NoSQL database. While the company has a strong position in the emerging NoSQL market, its customers are increasingly being asked to do more with data. To that end, MongoDB is looking to give customers better tools for managing tomorrow’s data tsunami.

At some point, that could include MongoDB supporting different underlying data models with its database management system. While the company is taking a big step forward in extensibility with its pluggable storage engine architecture, it is not yet ready to use the term “multi-modal” to describe its database.

“We are opening the door to move in that direction, but we are not offering multi-modal in this release,” Stirman says. “If you look at the Wired Tiger technology, there are some exciting innovations in area of row-oriented versus column-oriented storage. It’s possible that in the future we could introduce some different models into the MongoDB stack, but that’s not part of this release.”

MongoDB faces competition from several multi-modal NoSQL databases, notably MarkLogic, which has supported multiple data models for years. And today DataStax, which backs the Apache Cassandra wide-column store NoSQL database, announced that it has acquired the company behind the TitanDB graph database.

The other notable item in MongoDB 3.0 is the introduction of the OpsManager component. Running a large distributed cluster is never easy, and MongoDB has suffered its share of criticism in that regard. But all that will change with OpsManager, says Stirman.

“If you have 10 servers or 100 servers, the work to do an upgrade or configuration change is just clicking a button with OpsManager,” he says. “Adding more copies or replicas of storage engine doesn’t impact operations at all because Ops Manager does all the work for you.”