There was an interesting thread on the SQLDownUnder mailing list last week about the use of Key Value Pairs in database design. One of the Guru's on the list Jeremy Huppatzmade a few posts which covered off pretty much all the issues I was thinking about.

One of the key points that Jeremy made was this one:

Remember that relational databases are designed with 3 key goals - reliable data entry, stable consistent storage and rapid retrieval.By using KVP's you throw the rapid retrieval capability straight out of the window.

Working mainly in the BI field might mean that I can have a tendency to focus more on the retrieval side of things, but I think it is good practice with any system to keep both the inputs and outputs in mind while building it.

Rather than paraphrasing Jeremy's post, I contacted him and asked if he would mind if I reproduced it here:

KVP's look elegant for data capture but they make reporting a nightmare.

If you're going to use KVP's for data capture, strongly consider running triggers or a second-phase ETL process to transform KVP data into an entity-based schema.

Try to avoid reporting against big heaps of KVPs.

If you have no other option, try to ensure that your KVP rows have as many additional lookup keys as possible to allow queries against your heaps to be heavily filterable on the KVP categories, subtypes, etc.

Also consider the fact that with KVP's you might effectively be building some optimizable mathematical structures such as Directed Acyclic Graphs or Bidirectional Acyclic Graphs, and consider the extra data-field considerations that may be required.

Before doing anything else, however, I'd strongly recommend doing a full logical model and looking at what common properties on your sets of objects suggest standard ERD implementations and relationships.KVP's are most often used as a short-circuit to the data analysis process and are rarely an easy solution to support and maintain in the long term - especially when you start connecting enterprise systems together.

Not only did Jeremy post the above points, he then went on to work through a simple example of what happens when you try to report off a KVP structure.

One of the big problems with reports is that you end up building lots of logical subqueries when building reports.

Let's also assume that key_id, category_id and sub_category_id are foreign keys to lookup tables with appropriate category names that can be used for filtering, and that record_id is used as a foreign key by other tables that refer to our KVP heap.

Let's say for the sake of argument that our application is a medication administration database and we want to look for records where a specific medication specification has been administered.

So, when looking for something general like medication name like Ibuprofen, we need:

With the table as specified above, we have no way to correlate KVP's with a broader context.So... lets add an event_id column to the table (make it int NOT NULL) to allow correlation of medication administration events with the corresponding KVPs.

Then we might want to find a list of event ID's for which our medication matches a compound KVP definition for medication name, dosage level and administration route.

However, when you want records that match a "NOT IN" query specification (i.e. you want to screen out records that actually MATCH the fields specified - this is more common than you'd think) you end up with the JOIN query being no longer viable - the subquery becomes the optimal approach.

Consider the impact of a single "NOT IN" clause in a report that has 5 levels of filtering on a table with 20 million rows.What sort of response times would you expect to see?IMO, you're kidding yourself if you'd expect anything under 40-50 seconds.More realistic would be 2-3 minutes - if you're lucky.

Remember that relational databases are designed with 3 key goals - reliable data entry, stable consistent storage and rapid retrieval.By using KVP's you throw the rapid retrieval capability straight out of the window.

If you're looking to deploy this in SQL Server 2005, you might have some joy using the native XML datatype to encode your key value pairs, but searching for them is still going to be painful, even if you use the native XML indexing available in the latest DBMS release.

Anyway - you get the idea... I thought it might be better to illustrate the problem than just to say "Don't do it."Hopefully this gives you an idea of why the red pill might be a bad idea.

The other part of the thread that I also had an interest in came about when it was mentioned that reporting should not be too much of an issue as it was planned to do it all through OLAP cubes. Below was Jeremy's response to this proposal,

However, you still have the issue that a given logical record spans multiple rows in the KVP values table, and this makes filtering a pain.

From a cube perspective, many of your facts are going to be built as KVPs - you need an additional view that does a pivot summary of each of the KVP values relevant to a specific "event" into a fact table with the appropriate correlations to dimensional members.From a query processing perspective, this rapidly becomes a nightmare.Expect much longer cube-builds going down this route than using relational tables where the fact table doesn't have to be computationally derived.

With respect to cube dimensions, you're going to run into issues with dimension definitions as you're going to have to build a view defining each set of key-names to correlate the members for each level of your dimension.This is doable, but is also fairly hard to maintain.Adding a new property KVP to a record means doing a full process of your dimension and cube to reflect it in your reporting environment.

Relational data can speed up dimension and cube builds by several orders of magnitude, as you don't need to build an abstraction layer that defines the dimension memberships and levels required by each individual KVP.You simply define your dimensions directly from the relational schema.

I understand the attraction of KVPs from a web developers standpoint, and from the perspective of someone wanting to do a lot of SET/GET-based OOP, but once you start doing more complex method-based coding (e.g.

complex business rule validation which might update multiple instance/class properties in a single invokation), the benefits of KVPs are quickly eroded.

This of course should not be taken to mean that you should never use KVPs. This post is merely meant to highlight the issues involved with KVPs so that you can weigh up the issues involved. KVP tables are often used for storing things like settings (which I have done myself) or for storing extension attributes (I understand Microsoft CRM employs KVP tables for this purpose.

Comments on this post

I pretty much use KVPs exclusively for settings on the database side of things. I use them in most of my classes, however, as a way of showing values in a dropdown list with the key being some field, or combination of fields, which uniquely describes the object and it's record ID as the value.

Using a single lookup table for all pairs as described is a management nightmare. My motto is keep to the basics and try not to violate basic normalized database concepts!

I think database design, like all design activities is a matter of balance. In the case of KVPs, it is a matter of balancing the ease of input with output requirements. In some cases, like settings tables, the balance definitely shifts in favour of KVP tables. In terms of general data storage, my opinion is that the balance shifts the other way.

The only reason that I post here is because this is one of the first sites that come up when I do searches for key-value pair views. I was looking for a good way to write a view or a select for a create table ... as select or an insert into ... select, or as a simple create view statement. After some study, I found that a simpler but possibly slower solution for a uid-key-value set table worked like the following in postgresql:

SELECT uid,max(ud.user_name) user_name,max(ud.user_age) user_age,max(ud.user_email) user_email FROM (SELECT uid,CASE WHEN name='user_name' THEN value ELSE NULL END as user_nameCASE WHEN name='user_age' THEN value ELSE NULL END as user_ageCASE WHEN name='user_email' THEN value ELSE NULL END as user_emailFROM user_kvp_table) as udgroup by ud.uid;

With this solution, you only need to add two lines whenever you add an item to the view. I haven't worked out a way to make it find all the keys automatically, but this would allow for the data comparisons that you describe above in a simpler (although possibly slower) structure.

It can be used as attribute or as a measure...or a combination of the two:-)

Adding a key level allow you to act like a magnifying glass over a population for example....by pairing key_level_category pattern...you don't need to prebuild the sub...subsub...subsubsub...think in advance :-)

I've heard a lot of hype about the joys of KVPs in regards to NoSQL. Without having done the research or being an expert on hardware (i.e. without having a clue what I'm talking about I'd like to say something (like most of the internet)) I had a think about it in principle rather than practice and decided...

1) It's more straightforward to distribute one big table around servers than a whole bunch of them.

2) It could take advantage of the fact that all the data for, say, a facebook photo album (I don't know what structure they actually use) will be stored together physically if it was created at the same time rather than bits of it in different tables with pages all round the place. That would mean that in principle I could see the album being faster to retrieve by KVP than by any entity based approach. I don't know a great deal of math so that could have been what you were getting at with "Directed Acyclic Graphs".

3) Basically everything you said regarding the problems trying to get data out in anything other than the granular format it was entered.

So nice to see that sanity check from someone that actually knows things.

Appreciate the thread of discussion. I use KVP a lot while designing repositories or sections of applications that is intended to store Metadata. I absolutely feel that KVP is the best way to go about when it comes to capturing metadata, of course there is always set of views on top of these KVP objects to expose them in form of relational structures.

News

Legal
Any and all code, software, examples, suggestions and anything else on this web site is available for you to use at your own risk. No warranty is expressed or implied.
Views and Opinions
The views and opinions expressed on this web site are not necessarily the views or opinions of my employer.