More generally, marketing and other applications that maintain detailed profiles of customers or prospects. The information in these profiles is often based on a large variety of marketing campaigns, third-party databases, and analytic exercises. As the inputs pile up, the schemas get ever hairier.

Electronic medical records. Medical records are one area where non-relational approaches may actually have majority share. I blogged about one example in 2008.

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility* — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.

The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.

The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.

The energy industry keeps adding new kinds of highly complex equipment it has to manage.

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. For example, hospitals (especially in the US) have disparate medical records and billing systems, which can make information interchange a chore.

In theory and practice alike, the advantages of loose coupling between your database design and your application. (I think that’s a cleaner way of saying it than to focus on “reusing” the database, but it amounts to the same thing.)

In practice, performance and functionality in anything using indexes, even if joins aren’t involved.

Comments

What about the hybrid solution of having XML or JSON or BLOB (like MMORPG as you mentioned) or a key-value store inside a relational database for those corner cases? I saw that many times and while probably not the most efficient it did the job.

Sometimes that works. Oracle is probably world’s biggest vendor of enterprise search, or else was for many years. But RDBMS vendors have been surprisingly ineffective at narrowing the gap vs. more focused systems.

Indeed, I expect the NoSQL/NewSQL winners to wind up being systems that can exhibit multiple “personalities” in terms of logical data model.

aaron on
January 29th, 2013 3:20 pm

There seems to be a lot of dancing around the point here.
– In general, structured data with a variety of use needs is amenable to relational model RDBMS.
– There are edge cases relating to scale, model, etc. that NewSQLs are going after. ACID replication so far doesn’t seem one of these.
– There are edge cases around complex read only that some BI centric stores are attacking.
– There are database-trivial applications where KVPs allow fast, scalable, and redundant caching.
– There are structured apps with “soft edges”, where parts of the app have evolving schemas. These are addressed by document stores and RDBMS with XML and/or blobs.
– There are apps that are just going to evolve, such as ones where business goals are in flux. These are served by a variety of noSQLs which allow deferred and evolving schema definition which allow changing goals of the store.
– There are uses where the critical factor is archival and support of scalable storage and scalable processing where noSQLs (and some doc stores) shine.

Speaking as a practioner, preserving a single view of a schema is really hard. Having multiple views may work in a few areas, but is truly complex and is not for general use.

It would be instructive to look at your customer examples over the past few years. Likely, the ones that succeeded slid into more traditional RDBMS (and some noSQL, especially Hadoop), with only edge cases in newSQL and other specialized niche products.

The one thing I will say in favor of RDBMS is that they actually give you a means to do some analysis on your data. The NoSQL’s give you storage and retrieval at scale but totally throw up their hands when you want to do some analysis. The ability to analyze the data in place is what makes the difference between a database and a datastore.

Let’s not be quite so fast in saying that. First, a lot of NoSQL systems have at least some primitive analytic capabilities — aggregates, range queries, perhaps a form of map/reduce. Second, for hard core analytics, why do it in a short-request data store of any kind anyway? Third, there’s HBase/Hadoop integration. Fourth, there’s Cassandra/Hadoop integration. Etc.

NewSQL clearly hasn’t had a lot of traction to date. And the one client I pointed toward a Schooner/dbShards combo probably is looking for alternatives on the Schooner side right now. (Meanwhile, the individual who spearheaded that selection jumped to DataStax.)

But there are plenty of robust NoSQL installations, and I don’t include generic Hadoop as NoSQL. (HBase is NoSQL, however, of course.)

aaron on
January 30th, 2013 12:43 pm

I think Brian is pointing out the obvious why (and again your example is instructive). To a make a decision to go to a niche product implies a lot of trust of the vendor AND a willingness to look at a db API as a construction tool – rather than current practice of integrating vendor products. I can’t think of *any* examples I’ve seen of successful serious and unfrustrating BI against a newSQL, even ones with standard SQL JDBC or ODBC interfaces.

It would be really instructive to review *how and why people choose the new products*. The product vendors are of course dealing with the chicken and egg of competitors against mature stacks – you can do better tech for purpose, but you compete against an entire industry.

(And on Hadoop being a DBMS – Pig and Hive are pretty crude, but they seem to be among the largest use/largest growth BI tools right now, so it can be considered as much a db as SAS or Access.)

[…] for me. They do overlap a lot — and the whole discussion overlaps with my post about schema complexity last January, and more generally with what I’ve written about dynamic schemas for the past […]