3.04K

Votes

3.82K

Fans

146

Jobs

798

Votes

12

Fans

7

Jobs

3

Votes

3

What is
MongoDB?

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

What is
HBase?

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.

What is
TokuMX?

TokuMX is a drop-in replacement for MongoDB, and offers 20X performance improvements, 90% reduction in database size, and support for ACID transactions with MVCC. TokuMX has the same binaries, supports the same drivers, data model, and features of MongoDB, because it shares much of its code with MongoDB.

Reviews of MongoDB, HBase, and TokuMX

How developers use MongoDB vs HBase vs TokuMX

Used MongoDB as primary database. It holds trip data of NYC taxis for the year 2013.
It is a huge dataset and it's primary feature is geo coordinates with pickup and drop off locations.
Also used MongoDB's map reduce to process this large dataset for aggregation. This aggregated result was then used to show visualizations.

MongoDB fills our more traditional database needs. We knew we wanted Trello to be blisteringly fast. One of the coolest and most performance-obsessed teams we know is our next-door neighbor and sister company StackExchange. Talking to their dev lead David at lunch one day, I learned that even though they use SQL Server for data storage, they actually primarily store a lot of their data in a denormalized format for performance, and normalize only when they need to.

While the huge majority of BI data comes from 3rd-party sources, some pieces require ad-hoc sources - this is largely where Mongo comes into play. Views such as "Activity Log" need on-the-fly recordkeeping that's best manually entered; considering that fetching from a task manager API will paint an overwhelming or inaccurate picture of the month's activity.

The final output is inserted into HBase to serve the experiment dashboard. We also load the output data to Redshift for ad-hoc analysis. For real-time experiment data processing, we use Storm to tail Kafka and process data in real-time and insert metrics into MySQL, so we could identify group allocation problems and send out real-time alerts and metrics.

Nearly all of our backend storage is on MongoDB. This has also worked out pretty well. It's enabled us to scale up faster/easier than if we had rolled our own solution on top of PostgreSQL (which we were using previously). There have been a few roadbumps along the way, but the team at 10gen has been a big help with thing.

We are testing out MongoDB at the moment. Currently we are only using a small EC2 setup for a delayed job queue backed by agenda. If it works out well we might look to see where it could become a primary document storage engine for us.