I would like to hear some”real life experience” suggestions if foreign key restrictions are good or bad thing to enforce in DB.
I would kindly ask students/beginners to refrain from jumping and answering quickly and without thinking.
At the beginning of my career I thought that stupidest thing you can do is disregard the referential integrity. Today, after "few" projects I'm thinking different. Quite different.

What do you think: Should we enforce foreign key restrictions or not?
*Please explain your answer.

@Alex, thanks for feedback. Yes I am. In your profile it says you are 25, meet me in 7 years for a bear and a friendly chat, I'll ask you if you still think the same way as you do now :). Cheers
–
This is itFeb 8 '11 at 13:37

1

You say you are now thinking differently about RI - can you explain why?
–
Eric PetroeljeFeb 8 '11 at 13:38

9 Answers
9

I would add as an initial disclaimer that when you say DB and this question has tags indicating several platforms, that we are talking about a traditional relation database - i.e. a well-defined system which manages data in tables, columns and rows, according to Codd et al. This is a well-defined paradigm with well-understood boundaries.

If you are talking about a bunch of spreadsheets or files in a folder (many people do call this a database), or any similar non-relational database (perhaps under the umbrella term NoSQL), none of this has to apply.

I would always begin a project that uses an RDBMS as an RDBMS by using foreign key constraints. You can always relax constraints later. I would also normalize first, and denormalize as necessary.

It's far easier to relax a constraint later or refactor tables if you are starting from something where there are guarantees.

Similarly, I'm going to try as best as I can to identify the proper data types, ranges and nullability of columns at the beginning. I'm going to default columns to NOT NULL until there is evidence that the model requires NULL. I'm going to have a primary key on all tables, and where there isn't an obvious choice (either natural or surrogate, depending on the design philosophy), I'm going to add a surrogate, so that it will be at least possible to uniquely identify a row among otherwise identical rows if I have to do a DELETE operation.

These things can all be relaxed later, but adding constraints later is going to require you to figure out what was wrong in the data before you can apply a constraint.

These are just a few of the basic rules of thumb which keep your database on track as you start development. Refactoring from one structured design to another structured design (of whatever quality or philosophy) is a lot easier than refactoring from unstructured.

In my experience, an evolving database design will outlive many front-end applications over its lifetime (sometimes several simultaneously), so it makes sense for it to provide a boundary of services which are always consistent and uniform. For this reason, it is unwise to expose tables directly or to treat the database as simple persistence for any things which gets thrown at it. If you expect the database to just hold anything with any rules which may vary over time, the value of the database itself is lowered, and it will have issues with data quality, uncertainty about semantics and all associated issues surrounding that.

Foreign keys should be enforced where it makes sense to enforce them. If a column must refer to something else then it should have a foreign key restriction.

If the column might refer to something, but the business model suggests that it is not always the case, then it is acceptable to have a null in there, and have the foreign key present to ensure that, should it refer to something it definitely does refer to a real record, and not just some arbitrary value which could break logic/application-code.

Where this breaks down is when the software wants constraints in there for processing and error-prevention purposes, but the user* doesn't necessarily feel that he or she needs to provide that. I have seen cases in the past whereby it is perfectly logical to have a foreign-key constraint (A prospective client must have an employer (even if it himself) for a credit application). But the brokers want to save records that are just "quotes" without necessarily providing one at that point in time.

It's problems such as these that may give rise to people pointing the finger at enforcing foreign keys, but in my experience it's generally a design-issue. Because at this point we've ascertained that they need an employer to proceed with a credit application, but we don't need one to create a quote. There, we've solved it. We make sure the credit application can't proceed unless there as an association with an employer.

A saved "in process quotes" is very different then a formal quote that has been given to the customer, it need two different storage system, or you get this type of problem...
–
IanFeb 8 '11 at 14:06

@Ian, yup but I used this to highlight that problem.
–
Moo-JuiceFeb 8 '11 at 14:13

I have seen formal quotes that go to the customer with placeholders legitimately in them. These were values that would not alter the final price but were needed to actually order. For example, all tier 1 colors were the same price.
–
Loren PechtelMay 6 '14 at 21:04

By enforcing a foreign key relationship with ON DELETE|UPDATE conditions, your RDBMS does a bunch of work for you, for free, that you then don't have to do in your code. Furthermore, this allows you to separate model logic from application code. Let the RDBMS do its job by defining good relationships in your schema.

There are a couple of projects where the architects may consciously decide that they'll drop all FKs from the schema. They'll usually do this for various reasons (write performance, sharding etc). They'll then need to make sure that the relations integrity is 100% correctly maintained by the application.

I would question comparing a relational database to Amazon's SimpleDB or other NoSQL solutions. These are completely different types of systems for solving different types of problems. It is unfortunate that the original poster has not put the question in a meaningful context.
–
Cade RouxFeb 8 '11 at 15:34

I'm not comparing SQL to NoSQL. The podcast is 4 years old and it seems that amazon was still on RDBMS then, without FKs.
–
cherouvimFeb 8 '11 at 15:53

You do not need to drop all FK's, use foreign_key_checks=0 in your my.cnf
–
shantanuoFeb 11 '11 at 6:59

As always, "it depends". Typically they're good for RI and for performance (since most DBs keep an index on the FK that is used for joins). In a data warehouse, they're often not enforced because the data is sliced and diced so many different ways (e.g., historical sales data may refer to products that are no longer sold). In high-performance applications, the overhead of enforcing RI at run time may be unacceptable. In general, though, it's better to have and enforce them.

Typically in a data warehouse, products which are no longer sold will still be stored in an expired dimension, and enforced RI is still realistically possible for databases of multi-TB size with appropriate dimensional designs.
–
Cade RouxFeb 8 '11 at 15:31

If you try to insert a record into a table with an invalid foreign key it will not insert. Without a foreign key you would be able to insert this record that has a key that doesn't not relate to a record in the foreign key table.

Generally No. The only time I'd use them is if disparate code bases are hitting the database, and you want to ensure consistency across applications. Otherwise I prefer to keep the storage logic in the app's classes and control everything through code.

If you are working with not-relational DB, temporary tables, staging areas or bulk import tables I suggest not to use foreign keys. Checking constraints will slow down your processes and it's not necessary if you plan not to preserve data.

BUT ... If you are working with a standard relational DB ... well ... working without foreign key is FOOLISH!!!

I had been involved in lots of project where I was expected to migrate an existing database to a new architecture. Everytime I dealt with DBs with no foreign key, I found unreliable data, corrupted relationship, orphans ...

Expecting to control everything through classes and code, is UTOPIC. Maybe it can work in the first software releases or with small projects. On big projects or whenever another programmer will need to rework your code, he will have to be a genius to preserve all the required FK in each page via code!

Moreover having a detailed model with all FK active is important because:

The system will warn you if a bug in your code is going to corrupt data forever

@CadeRoux and @Michael have great guidelines; here's some feedback after about eight years of development using Oracle, SQL Server, MySQL and PostgreSQL: Use foreign keys!

Pros:

Faster development since you won't have to include special clauses in SQL statements to exclude garbage data. Need to summarize line items? Sorry, you'll have to join with the orders table because some line items reference deleted orders. You really don't want to work on optimizing 100+-line queries.