Over the years we’ve learned how to address the key business intelligence (BI) challenges of the past 20 years, such as stability, robustness, and rich functionality. Agility and flexibility challenges now represent BI’s next big opportunity. BI pros now realize that earlier-generation BI technologies and architecture, while still useful for more stable BI applications, fall short in the ever-faster race of changing business requirements. Forrester recommends embracing Agile BI methodology, best practices, and technologies (which we’ve covered in previous research) to tackle agility and flexibility opportunities. Alternative database management system (DBMS) engines architected specifically for Agile BI will emerge as one of the compelling Agile BI technologies BI pros should closely evaluate and consider for specific use cases.

Why? Because fitting BI into a row-oriented RDBMS is often like putting a square peg into a round hole. In order to tune such a RDBMS for BI usage, specifically data warehousing, BI pros usually:

Denormalize data models to optimize reporting and analysis.

Build indexes to optimize queries.

Build aggregate tables to optimize summary queries.

Build OLAP cubes to further optimize analytic queries.

Unfortunately, there’s one basic problem with these approaches: It’s impossible to build denormalized data models, indexes, and aggregates for every possible query that users will execute during the lifetime of a database. So BI pros must pick their battles and optimize the RDBMS based on current or near-term expected usage. But in today’s fast-paced business environment, “near-term” may mean days or even hours, requiring BI and data warehouse (DW) pros to spend a significant amount of their time doing nothing more than constantly optimizing and reoptimizing these databases.

Additionally, these row-oriented RDBMSes offer minimal support for:

Unstructured content.

Diverse data structures that often result in ragged, sparse, and unbalanced product hierarchies — a nightmare for data architects to model and users to query.

To address these challenges, both mainstream BI vendors and startups introduced databases designed specifically for BI reporting and analysis use cases. These databases are very specialized and not meant to support a wide variety of use cases — unlike their bigger, older, more versatile, jack-of-all-trades RDBMS cousins. Additionally, not all BI-specialized DBMSes are created equal, and inconsistent or conflicting vendor marketing messages only add to customer confusion. We recommend considering four types of BI-specific DBMS: columnar RDBMS and in-memory, inverted, and associative index DBMS. The applicability of and use cases for each specialized DBMS vary greatly depending on multiple factors, including but not limited to database size, data structure complexity, speed of requirements change, and a requirement to organize and report on unstructured content.

We’ve also produced a detailed Excel-based model that attempts to compare implementation and ongoing support efforts of these four BI-specific DBMSes baselined against row-oriented RDBMS-based BI projects.

Comments

You mention BI-oriented DBMSs without referring to one that was created over 15 years ago and is now part of IBM. The Red Brick Datawarehouse was started by data warehousing expert Ralph Kimball in the mid-1990s and its focus was as a relational DBMS that provided fast queries on large volumes of data. The company was acquired by Informix which was then bought by IBM.

Interesting that you bring up Informix. In my 25+ years as BI practitioner i never ran into Red Brick / Informix. In my 4 years as Forrester BI analyst I've never had a single client inquiry about the product. Why do you think that is? Is it used in some particular verticals or domains?

FYI, i am at IBM analytics event right now - not a single mention of Informix :-)

Red Brick was a pretty amazing technology in its day, I remember as a BI/DW consultant having drinks with my peers back in the late '90s / early 2000's and Red Brick was spoken of in awed terms... as for what happened, my guess is that after IBM bought it they didn't know what to do with it without cannibalising the DB2 EDW space, so it lost its way. Big shame, really, it was a very powerful DW- and analytics- oriented database.

As always a very good read Boris. HANA from SAP seems to be missing from your list above, after reading through the blog it seems to be a good "to be" solution for the problems mentioned. What are your views on this one?

I know you are a client of Forrester so download the full report. It definitely has lots of info on HANA both as a columnar and in-memory DBMS. If SAP delivers on what they are setting out to do, it's going to be a highly differentiated - unique, for now - and market disruptive product.

To continue: RDBMS is a technical term and this categorization might be a little confusing.

RDBMS is by definition tabular, not columnar. From Wikipedia:

"The short definition of an RDBMS is a DBMS (Database Management System) in which data is stored in tables and the relationships among the data are also stored in tables. The data can be accessed or reassembled in many different ways without having to change the table forms."

Columnar databases are by definition not tabular (i.e. non-RDBMS), kind of like NoSQL databases are by definition 'Not SQL Databases'.

There are myriad implementation of such DBMS's, each optimized for different things, but they all have on thing in common - they are not tabular in nature. Some support SQL, some don't. Some are optimized for disk, some for physical memory. Some for high-volume read-writes, others for high-volume reads. Some run on Windows, some on Linux. Some are designed for distributed computing, other for utilization of multi-core architectures within a single node. Some for concurrent use, others for single user. And so on.

There are several columnar databases that enable SQL access. Vertica is one you listed, our ElastiCube is another. Neither of which are RDBMS.

If I had to use the term 'Columnar RDBMS' for anything, it would be to describe vertically fragmenting tabular data inside an RDBMS (i.e. store each field in a separate table, with an additional correlation key). But that is not what Vertica does, nor what we do.

I hear you. My definition is more simplistic and pragmatic. As long as a DBMS supports ANSI SQL natively or via a wrapper and therefore BEHAVES like a relational DBMS, it's relational to me. If it walks like a duck and quacks like a duck ... :-)