Think Big

Preamble

I recently spoke with Sebastian Machuca (a Sydney-based engineer) about the concept of having FK constraints in Dev and Staging environments, but disabled in Production. Here is the ensuing dialogue, edited for brevity and for making-me-look-like-a-genius.

Our dialogue was good, but we were unable to come up with a concrete list of problems that the database will incur when FK constraints are removed from production databases. I found some strongly-worded opinions on the interwebs about FKs, but nothing that had precise technical backing.

For example, there is a Stack Overflow Q&A that both Sebastian and I found that contains this assertion:

knowing about the relationships between tables in the database helps the optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.

Everything I know about statistics-based optimizers tells me this statement is false. Does anyone have a reference that talks about this in more (technical) detail?

The Foreign Keys Dialogue

S: Why do you say that is it wise to drop foreign keys in a production environment? Do not most advise to have them? I would personally consider an anti-pattern to not have enabled the foreign keys.

T: Well, this is a subtle point. So let's say you create some FK relationships. And you write some code that violates them. What happens?

S: Breaks all the things. 500 error.

T: So then you fix some bugs... less 500 errors... you fix more... And finally it doesn't break. At that point, the code is correct, yes?

S: Yes.

T: And so if I leave the FK constraints in place, they really do nothing [ed: at this point, we've not recalled the cascading delete feature of FKs] But FK constraints use a lot of DB resource, for enforcing business rules that have already been enforced by the (now-fixed) code!

Concurrency

S: But concurrency is a key factor in here. When you are in development or staging, you have very little concurrency. Also, Referential Integrity should be handled at the lowest possible level, which would be the underlying database. Relational Database Management Systems are optimized to handle this.

T: Well. Referential integrity can be relegated to the database, and it has the tools (Foreign Keys) to do it, but it's not necessary. Recall that as your app gets errors that you fix, you are encoding the RI into the application. So once your application doesn't have errors anymore (assuming you got all possible errors) then your RI is perfectly encoded in the applicaiton. So the THEORY is, once the app is debugged, then the RI is fully in the app, and FK is completely redundant.

S: But even if your code is “perfect”, that doesn’t mean it's perfect for high concurrency.

Orphaned Rows

T: I think you're saying that if the app tries to keep two tables in sync without using a transaction, that a FK would save the application from error? But let's look at the example. We have a parent table with people==(id, name, address_id) and child table addresses==(id, address). We want to delete this person. FK will cascade the delete and ensure the addresses table row also gets deleted. So the app must, absent FK, do delete from people... delete from addresses...

Maybe this is the case we're concerned about? That the app forgets to do the cascading delete?

S: Correct. That is just one example.

T: When I search Google trying to find why people think FK are necessary, this seems to be the main case they're worried about. "Orphaned Rows." [ed: we can encode deletes of the child table rows into the DB access layer]

T: Okay, so then what's the other case? When it comes time to insert... the application will insert into addresses... insert into people right? And whatever ID it gets when inserting to addresses it'll use as the ID for people.address_id? But how would a high-concurrency situation change this? No matter what, when you insert into addresses, you'll have an ID assigned... No other thread can get it, whether an FK is there or not?

S: Updates as well can be complicated.

T: When you UPDATE, you will never need to change the ID. Maybe let's imagine for a second that you do, though. So you update the addresses ID from 1 to 5... And another thread at the same time thinks it can change 2 to 5... The primary key (or unique index) saves from two threads changing id=5.

History

S: Okay. I give up... FOR NOW. Tell me why FKs exist at all, then.

T: So relational databases were made in... the 1960's? 1970's? Something around then. At the time, everything... and I do mean everything would happen on one computer. Codd and all those brilliant guys thought of how to have the database make sure shit was unbroken at the closest layer to the data as possible, in the database.

This made sense at the time. It even a little bit makes sense today, but here's what's changed: we started building systems that run on many computers. When we did that, we found out some components scale horizontally really well, like stateless application code. And some components are really hard to scale horizontally, like stateful database code.

So typically the former runs on a whole lot of cheap machines. The latter runs on fewer very expensive machines.

So a bunch of smart people sometime in the past 10 years or so figured this out: When you can move the work that used to go into the database into the application layer... The whole system becomes more efficient/cheap.

We used to have these "stored procedures" and people would write entire applications worth of code in them. No-one does that anymore, because in so doing, you're pushing all your compute to expensive hardware that doesn't scale well.

Oh, and stored procedure languages are ugly... but I digress.

Concerns

S: Nice. I get what you are saying. Just, relying on the app to handle the edge cases makes me nervous. I think if our development environment would have a comparable stress test as production then I wouldn’t worry so much.

T: I feel like there is potentially a legitimate concern in here, but we can't seem to put a precise condition on it (outside of the cascading DELETEs).

So yeah, a very real problem is if no-one ever tests in dev/staging. Then no FK constraints are ever violated. Then we go to production where I've removed the FK constraints, and data gets all wonky.

But really, the edge cases around FK violations are pretty boring/trivial. It's all about "I deleted the row from the parent table and left the row in the child table, oops!" and... that's about it, outside of UPDATEs to the child table without subsequent UPDATE of the parent table (and I'm still struggling to think why you'd do that anyway).

There can be "I created a parent table entry and left the ID for child NULL and never created a child row"... But we can define the parent table ID column as NOT NULL to solve that problem.

There are a lot of DB professionals (esp in the MySQL community) who strongly claim FK constraints are NEVER necessary. In any environment, anywhere. They say code FK constraint logic into DB layer in your app. I'm not quite that extreme.

It's an interesting discussion. I haven't put a LOT of thought into it... just enough that I feel about 95% confident we can drop FK constraints in production, and getting that performance win would be great.

S: But Foreign Keys enforce the lowest level business rules, for example, "if an order exists, it must be related to a customer that exists." If you remove the constraint, you no longer have enforcement of that rule, which would allow any client of the database to (for example) remove a customer who owns orders, while leaving the orders in place. You would have to write more (error-prone) code in your application (and test that code) to prevent this.

In some places in the code, we don't use an ORM, and have just raw SQL in the code base. I think a good experiment would be to run some queries in search of orphans in our DBs.

I see as a benefit the use of constraints, because they are an additional layer of correctness/sanity checking on your application. They will expose bugs that would otherwise stay hidden, in development and in production. And I prefer to detect a bug than to "have an order with no customer".

Timeless Foreign Key Checklist

T: I guess what it comes down to is the following:

How much testing is done on application code before it's rolled out? If little to none? FKs will do the work for you in production.

How disastrous is it if there is broken Referential Integrity due to no FKs? If significant? FKs will be a good safety net.

How bad is the performance hit of FKs in production? If significant? FKs should be avoided.

In my experience, at most orgs (and my gut instinct is that this holds at Bigcommerce), #1 is little, #2 is insignificant, and #3 is significant.

As to your particular example, I do find it hard to imagine the situation where an order might get created for a customer that doesn't exist. Transactions can solve that problem.

Also, "broken" RI can be a benefit. For example, if we truly DELETE a customer from the database, we might still want a record of the fact that orders were placed by (a now-unknown) customer. If we enforce cascading DELETEs to save us from orphaned rows, we cannot have this arguably quite useful state in the database.

If the answer is to just flag the customer as deleted without actually DELETEing the row, then we've left the realm of where FKs help again.