Way back when, when I was first moving on the track to data modeler and architect, I was responsible for designing schemas and applications. I worked at a small company doing various developer, DBA and data modeling tasks.

As I turned over an OLTP data model, I was taken to task by my boss for implementing foreign key constraints in the DDL. His reasoning was that foreign keys made removing data difficult, there by causing extra downtime during maintenance. I was told that we did not use foreign keys and that I would have to remove them. I quit. It was a combination of the reasoning as well as his approach (getting snitty with me in front of the team).

Anyway, that should give you some idea of how I view foreign keys. Sometimes foreign key constraints need to be deferred. Sometimes, they need to be deferred until very late in a process (like when doing massive data loads). I'm ok with that.

In an OLTP database, at least in Oracle, there is no reason not to have foreign keys enforced. Postgres and MySQL both benefit from denormalization here but Oracle excels in a normalized environment.

Here is where I would like some opinions. These questions are regarding Kimball-esque data warehouses as opposed to OLTP.

In a staging area, do you think foreign keys are required? In my opinion, if a staging area really is a staging area (which means temporary storage while ETL is happening), foreign keys can be dropped if the source system has enforced referential integrity throughout and if populating the staging area is done while the source system (singular) is in a consistent state (quiesced). If the source does not enforce RI, staging needs foreign keys. If multiple sources are used, staging needs RI. If source systems are processing ongoing transactions, RI is needed to ensure consistency. If staging will also be used as history/reporting, RI is needed. What do you think?

An ODS, if used, should be 3rd normal form with 100% RI. Agree?

Now for my real question. My current client does not want foreign keys in the data mart. I think the data mart MUST have foreign keys. I also feel that they should be enforced with deferability being a valid option. I feel very strongly about that. My reasons:

Even if the source system enforced RI, the data mart is not a 1-1 mapping. RI needs to enforce the relationships, not the code.

User level tools (BO, Hyperion, Cognos, etc) benefit from FKs being defined in the mart, i.e. metadata. In this case, they do not need to be enforced.

Give the CBO as much information as possible.

Foreign keys can be deferred, or dropped, if needed during a load.

So, what is your opinion on foreign keys in a data mart? Required, optional or not needed?

I've never implemented a data mart that did not have enforced foreign key constraints. Have you? How did that work out?

My blog represents my thoughts and opinions and not my employers thoughts and opinions. If you want to know what my employers think, it is best to ask them directly. On the other hand, I pretty much say what I am thinking right here.

7 Comments

My personal take on Foreign Keys in a DWH: There is no need to have them enforced, especially not on production. Data Marts are loaded using ETL load Routines, and one of the main reasons to have ETL tools and programs is to ensure referential integrity, record data quality measures along the ETL path etc. So basically enforcing foreign key constraints - whether deferred or not - is forcing the DB to do the same thing, that ETL already does. If the ETL doesn't work, and the problem hasn't come up in testing, I doubt that a constraint will really help with the problem.

I have seen a number of installations, where ETL was done very poorly, basically just dumping data from one place to another, mostly relying on hopes and assumptions. The thing is: Foreign Key Constraints will usually only help with a small subclass of problems. Most will still go undetected due to denormalized tables, while still being expensive specifically for fact data loads.

Iirc Oracle can use a Foreign Key which is not enforced as an 'optimizer hint'. For documentation purposes it also makes sense to have foreign keys. In all testing environments it even makes sense to enforce them. But I have only rarely seen them enforced in a production DWH and if so, then mostly for some dimensional data which was edited and maintained manually by business, or due to some similar specific business requirement.

Interesting viewpoint. Thanks for sharing it. That's the kind of info I was looking for.

I agree with at least putting them in and not enforcing. That would be at a minimum, for me.

I do disagree with some of your points, though. I believe the db should be the final say of RI. I actually use RI when doing ETL to make the ETL less complicated. Why put it in code when it can go into the database?

I have seen and worked with a huge number of data marts and as far as I can remember, ALL of them had enforced RI. My memory may be faulty though.

Lewis--
In my opinion:
1) every datamart should have FK, at least in RELY DISABLE NOVALIDATE state for documentation and optimization purposes.
2) As for validation... well, I've been experiencing a half-revolution in my thinking on this (a full revolution would be 360 degree and you'd just end up where you started).

In my prior experiences, I designed the marts and the ETL. So I knew it was done right (IMHO :), and I relied on ETL validations in this 3-step fashion:
a) load fact table
b) load dimensions with "vetted" data (reviewed and approved by business data stewards), typically copied from central, conformed dimensions.
c) validate referential integrity by creating "local" dummy records in dimensions where necessary. (These tend to annoy users who I can just re-direct to the data stewards).

However, my more recent experience is with ETL written by others, and there's just a gap in communcation sometimes (even when I write the specs!). In these cases, I'm much more inclined to be "hard line":
a) the structures are in a schema controlled by the architect with referentially integrity full enforced.
b) the ETL code is in a separate schema with no authority to change structures.

In this instance, the ETL is being handed off to others who will write the code (hand written, they decided not to use an ETL tool). There is no real documentation as of yet and there is a short time line for delivery.

I'm curious how you deal with gaps in referential integrity: our ETL team coined the phrase "shoring up the dimensions" when referring to the dummy records I require to be built. (BTW, since the dummy records are ETL-created, I didn't have any issues with ETL-enforced integrity -- until I found not everyone was "rabid about referential integrity").

When you say gaps, do you mean instances where you might have late arriving dimensional data or dirty data that needs to be cleaned prior to loading? In those cases, I usually use "data not arrived yet" and "dirty record" FKs. I set up a surrogate key like -1, -2 whatever. When the data finally comes in and is cleaned, I update the fact to the correct key value.

I have also used an all encompassing -1 depending on requirements. That one is a "might be null, missing or data".

Disclaimer: Blog contents express the viewpoints of their independent authors and
are not reviewed for correctness or accuracy by
Toolbox for IT. Any opinions, comments, solutions or other commentary
expressed by blog authors are not endorsed or recommended by
Toolbox for IT
or any vendor. If you feel a blog entry is inappropriate,
click here to notify
Toolbox for IT.

Lewis Cunningham is an Oracle ACE Director, Oracle 11g Certified Professional (OCP), Oracle 11g Certified Data Warehouse Implementation Specialist, Database Architect and self-professed database geek. Lewis has almost 20 years of database experience. Follow along as he builds a working Oracle encyclopedia, sharing his knowledge and experiences, and describing ways to integrate that technology in various projects and business areas.
less

Receive the latest blog posts:

Share Your Perspective

Share your professional knowledge and experience with peers. Start a blog on Toolbox for IT today!