At the Forge - NoSQL? I'd Prefer SomeSQL

Are non-relational databases the future? Reuven examines the NoSQL movement and what it means for Web apps.

When I started to develop Web applications in the early 1990s, I
sometimes found myself needing to store persistent data. At the time,
the main options available were files—plain-text files, binary
files or DBM (now Berkeley DB) files.

Fortunately, the languages I was using for Web development at the
time (mainly Perl and Tcl) made it fairly easy to use any of those
options. However, Web developers like myself quickly discovered the problems inherent in
these technologies, particularly on large sites. The lack of
intelligent, fine-grained locking meant that a popular application
would be forced to wait before it could read from, or write to, an
external file. Moreover, storing the data in this way prevented us
from retrieving it intelligently or selectively. You could, of course,
read the data into your program and produce your own reports, but each
report required not just its own query, but its own program.

So, you can imagine how my life changed dramatically when I was
introduced to relational database systems. Simply put, a relational
database is built out of many two-dimensional tables, with each table
containing one or more columns (the attributes) and one or more rows
(the records). You could relate these tables to one another (thus,
making the database “relational”) with foreign keys, such that a
column in one table would represent a record in another table.

When you store data in a relational database, you suddenly are able to
stop thinking in terms of how your data is stored and more in terms
of how it is structured. Is the database server keeping information
in binary format, in ASCII or in something altogether different? Who
knows? And furthermore, who cares? So long as I can store and
retrieve my data easily and efficiently, I'll treat the database as a
black box, concentrating on writing my application.

SQL, the query language that works with relational databases,
encourages this kind of thinking. When you use SQL to query a
database, you describe the data you want to retrieve, rather than
telling the computer how to retrieve it. True, now there are
procedural extensions to SQL, and you also can write programs in
non-SQL languages that manipulate the data in further ways. But, even
they allow you to loop over, manipulate and perform calculations on
the retrieved data. The data itself is still in the database, and
you still describe how and what you want to retrieve in the
declarative language of SQL.

Relational databases became popular because they offered so many
advantages to programmers. They offered stability and security. They
offered transactions, allowing programmers to ensure that a set of
operations either all happened atomically or that they were rolled
back to their state before the query began to execute. It was fairly
easy to map many types of data onto two-dimensional tables. Databases
gained all sorts of capabilities, including the ability to check the
integrity of the data they are storing. And, they even gained the
ability to distribute the load across multiple computers, albeit with
some degree of difficulty.

Open-source relational databases, such as MySQL and PostgreSQL, began
to challenge their commercial competitors. When I began writing this
column in 1996, MySQL had not yet been released under an open-source
license, and although PostgreSQL was fully open source, it had a number
of limitations. So, it was expected that if you were interested in
using a relational database, you almost certainly would fork over
money for one, paying Oracle or one of its smaller competitors many
thousands of dollars. Today, the opposite is the case. Open-source
databases are more than adequate for Web applications. Moreover, it
is basically expected that a Web application will use a relational
database as a back end. If you say you are a Web developer,
it is basically expected that you have some experience with SQL, as
well as the care and feeling of a relational database.

So What's Wrong?

I've been using relational databases for years, and I'm quite
comfortable with them. Indeed, as I've sometimes told my clients, I
see the world through two-dimensional tables. Breaking problems down
into tables and the relations among them is something I do nearly
every day. Yet, even I recognize that a database is a tool, and that
not every tool is appropriate for every task. Saying every Web
application should use a relational database is like saying all
programs should be written in C (or whatever your favorite programming
language is at the moment). As the old adage goes, if your only tool
is a hammer, every problem looks like a nail.

And, there are indeed some issues with relational databases that have
come to the foreground during the past few years. Perhaps the biggest
problem many developers face is the “impedance mismatch” between
objects, with which we enjoy writing our programs, and tables, in
which relational databases function. If you want to store your object
in a database, you have several options, none of which are
particularly good, and all of which lose some of the semantic value
that is the whole point of objects. True, you can treat each object
attribute as a column, each instance as a row and each class as a
table, but many corner cases don't map well to this,
even with an excellent object-relational mapper (ORM) library. One of
the reasons I enjoy working with Ruby on Rails is the large degree
to which ActiveRecord is able to paper over these differences, but
even with all of ActiveRecord's smarts, some problems can't
be solved easily with objects, where it's required to massage my data
model or even to drop down into SQL.

This problem has become bigger in the past few years, with the growth
of documents and objects that contain many different types of data,
some of which are unpredictable or contain highly variable
content.
In response to these needs, database servers have begun to incorporate
functions that previously were considered to be outside the
database domain, such as full-text search (which is extremely useful)
and XML handling (which I still find puzzling and somewhat
inappropriate). Regardless of your attitude toward such features, it
does seem a bit strange to set up a database server just so you
can store and retrieve files. There must be better ways to store such
files, without all the administrative and computational overhead
associated with a database server.

Another problem has to do with the greater and greater performance
requirements of Web applications. I tend to dismiss talk of
“scalable” languages and frameworks, in part because I believe that
any language or framework can scale, given sufficient hardware.
Instead of worrying about the scalability of a particular language, I
would concentrate on the scalability of an architecture—and that is
certainly a weak point for relational database servers. Modern Web
applications using a “share-nothing” approach easily can scale up to
hundreds or even thousands of Web servers, each handling part of the
load. But if each of those servers needs to communicate with a
database server on the back end, the database might become
overwhelmed at some point. There are solutions to this problem, even
among the open-source databases, but they are far from dead simple
to install and configure.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.