Data Design Matters

As important as it is, data modeling has always had a geeky, faintly impractical tinge to some. I’ve seen application development projects proceed with a suboptimal, “good enough”, model. The resulting systems might otherwise be well-architected, but sometimes strange vulnerabilities emerge that track directly to data design flaws.

Recently I saw an example where a “good enough” data design, similar to the one pictured, enabled a significant application bug.

My fictionalized model illustrates this case’s anti-pattern. Say there’s an order management system tracking Orders, Order Lines, and “Order Line Events”, customer transactions involving an order item. The Order Line table includes a foreign key to the Order table. The Order Line Event table includes foreign keys to both the Order Line and the Order.

The latter relationship from Order Line Event to Order is logically unnecessary. Each Order line is related to exactly one Order, so if an Order Line Event relates to an Order Line it must also relate to a specific order.

Beyond being unnecessary, in this case the extra relationship turned out to be harmful. Somehow the online system had a bug that updated an Event’s relationship to a different Order than the related Order Lines. So a single Order Line Event could be related to two separate Orders, one through the Order Line and the other directly through the Changes/Changed By relationship.

In the real example there were some specific impacts that I won’t go into, but you can imagine the possibilities. Here are just three:

Processes that purge old records may be unable to delete orders with mismatched foreign keys. Say you want to delete all data for Order 1. If an Order Line Event relates to both Order 1 and Order 2, the foreign key to Order 2 prevents delete of the Event unless Order 2 is also being deleted.

So, a database design that seems imperfect but “good enough” in fact isn’t. In this case it would have been well worth taking extra time during design to prevent the chance of subtle but significant errors in the application.

In the example I drew the article from (retaining the table names used in the post), the Event table has a surrogate primary key of its own and two non-nullable foreign keys, one to Order Line and the other to Order. It is true that the Order or Order Line can be updated, but in this case there's no reasonable business interpretation if the Event relates to two different orders (although I could see possible cases where reasonable interpretations could conceivably exist).

It also might be reasonable for there to be two different "master tables" referenced by the two foreign keys, but in this case both ultimately point to the same Order table.

Sadly, the non-null foreign keys (which I didn't mention in the post) prevent the very reasonable possibilities you raise in your second paragraph.

Even if you do not have the extra relationship, nothing really prevents the online system to update the order number unless it is part of the primary key/index (in which case, updating the order number will not be permitted even with the sub-optimal design.)

In this case, it looks like they have two order number columns in the event table that join to different master tables? Are the primary keys (autogenerated) surrogate keys? Looks like they wanted to use the events table to multiple purposes - like register line level as well as order level events. In such case, they can have an event id as a primary key (and the event can either be at whole order level or a order-line level; optional relationships.) For the order level events, the line numbers will be null (meaning it applies to the entire order.)