Why Relational Databases Don't Handle Big Data Well

Why Relational Databases Don't Handle Big Data Well

Limited scale and ability to analyze unstructured data are among the reasons that relational databases may not be best for big data. Here are some others.

Relational Databases Are Not Designed for Change

Data in relational databases is arranged in rows and columns, with each row representing a unique entry and each column describing unique attributes. Data modeling must be done in advance and can take months or even years, depending on the system. Changes after the fact are time- and resource-intensive, and database-modeling projects can take many years and cost millions of dollars. Big data is constantly changing, requiring a database platform that is flexible and forgiving.

They're Not Designed to Handle Variety

Relational databases are not designed to handle the various shapes and sizes of structured data. Structured data is only a small percentage of the data with which companies are dealing. They also have lots of unstructured data that relational databases cannot handle. Relational databases can be configured to accept data variety, but only with cumbersome changes that result in additional schema complexity, or in ways that do not get the full value out of the data.

They're Not Designed for Scalability, Elasticity, Resilience

You can't read the news without hearing about the heights to which data is expected to grow. Databases need to expand in kind, but relational databases weren't designed for growth on demand. Relational database vendors have come up with many features that help close the gap, such as shared storage, in-memory processing and better use of replicas and distributed caching. But regardless of any re-engineering, relational databases are simply not designed for the scale at which today's businesses operate. Trying to make them scale typically results in buying more and more expensive hardware and handling periods of disruption while changes are made.

They Are Not Designed For 'Mixed Workloads'

"Mixed workloads" refers to the ability to handle operational and analytical workloads. In the mid-1990s, a split arose between databases optimized for operational workloads and those optimized for analytical workloads, which contributed to the creation of disparate data marts, data warehouses, reference data stores and archives. IT departments now are overwhelmed and exhausted from the complexity that relational databases have created and are in need of simpler, more flexible solutions that can deliver information in various forms to numerous users at any point in time.

They're a Mismatch for Modern App Development

Modern applications are built using object-oriented programming languages that treat data structures as "objects" that contain data and code. This way of handling data is very different from how relational databases handle data. To get around this mismatch, developers use a technique called object-relational mapping (ORM), in which application developers work with business rules and logic and generate views of data that make the most sense from an app developer perspective. However, ORM has become known for its tendency to introduce more complexity, performance loss and increasing the potential for buggy code.

They're Not Designed to Track Time

Tracking time-varying data with relational databases, or "temporality," was never part of the original model. Ways of managing time with relational databases and writing SQL queries with a time element have evolved to answer questions about the history of when events occurred, but implementations still vary from vendor to vendor and development can be complicated. Managing "bi-temporal" data, or tracking both when things happened and when they were recorded, is an even bigger stumbling block for relational databases. Enterprises need accurate and on-demand histories of their data, often for regulatory compliance and in-depth analytics, but the constraints of relational databases are too limiting.

They're Not Effective at Returning Search Results by Relevance

A relational database would not be able to do relevance ranking in a way that a search engine, such as Google, does; it would just return a list of results based on a simple ordering of values. But if the database only handles values, information in any unstructured free text is ignored. When this information is ignored, enterprises cannot get the relevant information they need. Organizations don't just want to know "all of the drugs a patient took before 2005," but answers to more sophisticated questions like "all of the drugs a patient took before 2005, ordered by how often they were mentioned in doctors' notes." A relational database is not designed to answer those types of questions because they require a level of indexing sophistication they do not have.

Their Enterprise-Class Features Can't Be Used for Big Data Workloads

For decades, relational databases have offered an unparalleled level of security, reliability and data management that is critical in today's business environment. The problem is that these enterprise-class features don't mean much if they can't be applied to big data. Other options are now on the market that do meet the needs of big data without sacrificing the enterprise features that are critical for businesses running mission-critical applications.

How NoSQL Can Bridge the Relational Database Gap

Many companies are running NoSQL databases to store and manage big data effectively. SQL databases eschew the strict predetermined schemas of relational databases and are much more flexible and scalable. NoSQL databases aren't strictly used for big data, but they are especially well-suited for it. Many NoSQL database providers include Couchbase, Datastax, ReThinkDB, Cassandra, MongoDB, FoundationDB, MariaDB, MarkLogic, Basho and others. They offer varying degrees of enterprise capabilities and support. Companies should carefully evaluate NoSQL offerings to determine if they have the security, high-availability and disaster-recovery and management features necessary in enterprise environments.

Enterprises are confronting the reality of big, fast, varied and changing data. It's no longer about managing a small number of systems, but rather hundreds of systems and petabytes of data. Smart companies know that managing large volumes of structured and unstructured data, known as big data, is crucial to modern business operations and more complex business analysis. There is a problem: Relational databases, the dominant technology for storing and managing data, are not designed to handle big data. In fact, relational databases still look similar to the way they did more than 30 years ago when they were first introduced. Businesses focused on big data no longer can rely on the one-size-fits-all relational model; they must look toward new databases better designed to handle current workloads. "When relational databases were first developed, data was seen as small, neat, structured and static, and that's the only way it could be stored," Matt Allen, senior manager of product solutions at NoSQL database specialist MarkLogic, told eWEEK. "Today's data is anything but that." This slideshow, which provides a NoSQL point of view from Allen, contends that relational databases are not the right tool for handling big data. Of course, it's up to you to decide whether NoSQL is the right one for your use case. Here is a link to a later eWEEK story outlining the other side of the relational-vs.-NoSQL database debate.