NoSQL Document Storage Benefits and Drawbacks

NoSQL databases sometimes feature a concept called document storage, a way of storing data that differs in radical ways from the means available to traditional relational SQL databases. But what does “document storage” actually mean, and what are its implications for developers and other IT pros?

This article will focus primarily on MongoDB; the techniques utilized here are similar in other document-based databases. I’m assuming you’re already familiar with the basics of SQL databases and how each table has a fixed schema. That’s the first place where databases such as MongoDB are different: in Mongo, “tables” are referred to as “collections,” and records within a single collection can have different structures.

With Mongo, your records are stored as a binary form of JSON (called BSON). JSON stands for JavaScript Object Notation, and its syntax is basically the same as the object notation in JavaScript. For example, a single record might look like this:

{

“_id” : ObjectId(“4fccbf281168a6aa3c215443”),

“first_name” : “Thomas”,

“last_name” : “Jefferson”,

“address” : {

“street” : “1600 Pennsylvania Ave NW”,

“city” : “Washington”,

“state” : “DC”

}

}

This single record consists of a first name, a last name, and then an address that is itself an object with further data inside it. In MongoDB, each record is given a unique ID, a special type known as an ObjectId (although you can use other types for this id, such as strings). Mongo generates these unique IDs automatically, or you can create them along with the record.

These records are called documents. They’re not necessarily documents in the sense of a word processing document, although you can store binary data (such as a word processing document) in any of the fields in the document. You can also modify the structure of any document on the fly by adding and removing members from the document, either by reading the document into your program, modifying it and re-saving it, or by using various update commands.

This schema-less approach can be both a blessing and a curse. As a developer, I love that I can easily store complex structures in a single database record. If I were to take the example above and put it into a SQL table, I would either need to “flatten” it (by pulling the address, city, and state fields out of an inner object and making them part of the main object), or else put the inner objects in a separate table and include a foreign key to that other table. And from a programming perspective, these documents map beautifully to complex objects in my code.

But that can also cause problems. With a traditional SQL database, the database administrators and analysts can carefully design the schema for the table; once the schema is in place, programs can only add records that match that schema. That puts restraints on the programmers so they don’t accidentally (or intentionally) put unmatched data into a table. But with Mongo, the programmer can easily drop any type of data into any collection—raising the potential for accidents.

With the right tools, you can find a compromise. For example, the different language drivers for Mongo allow you to read documents into an object with a specific structure, and write documents from objects with a specific structure. In strongly-typed languages, this means you create an instance of a class, and then save that instance right to the collection. And if you do want to allow some leniency, you can use a Mongo-specific class in your code that works like a map, letting you add members on the fly. (The name of this class varies between languages.) Or you can create a strongly-typed class and include a member whose type is that Mongo-specific class; that member serves as a “catch-all” for data that doesn’t match the class’s schema.

(As for weakly-typed languages, such as JavaScript in Node.JS, it’s harder to force the programmer into a schema, but there are libraries that add class-like schema support.)

In the end, like so many tools, document storage in a NoSQL database can be easy to abuse; but when handled with care, it can become a powerful feature.

YOUR CAREER. YOUR PATH.

Author Bio

Nick Kolakowski has written for The Washington Post, Slashdot, eWeek, McSweeney's, Thrillist, WebMD, Trader Monthly, and other venues. He's also the author of "A Brutal Bunch of Heartbroken Saps" and "Slaughterhouse Blues," a pair of noir thrillers.