Blog Archives

Constraints are sometimes annoying in real life, but no society can exist without rules and regulations. The same concept is found in Database Design: no good data can exist without constraints.

What they say in Heaven

Constraints define what is acceptable in the database and what does not comply with business rules. In Heaven, where the perfect database runs smoothly, no constraint is overlooked and all the data obeys to the rules of angels:

Every column accepts only the data it was meant for, using the appropriate data type

Every column that requires a value has a NOT NULL constraint

Every column that references a key in a different table has a FOREIGN KEY constraint

Every column that must comply with a business rule has a CHECK constraint

Every column that must be populated with a predefined value has a DEFAULT constraint

Every table has a PRIMARY KEY constraint

Every group of columns that does not accept duplicate values has a UNIQUE constraint

Chaos belongs to hell

OK: Heaven is Heaven, but what about hell? Let’s see what will get you instant damnation:

No PRIMARY KEY constraints: In the relational model, tables have primary keys. Without a primary key, a table is not even a table (exception made for staging tables and other temporary objects). Do you want duplicate data and unusable data? Go on and drop your primary key.

NULL and NOT NULL used interchangeably: NOT NULL is a constraint on your data: failing to mark required columns with NOT NULL will inevitably mean that you’ll end up having missing information in your rows. At the same time, marking all columns as NOT NULL will bring garbage data in the database, because users will start using dummy data to circumvent the stupid constraint. We already met these sinners in the First Circle of the SQL Server hell.

No Foreign Key constraints: Foreign Keys can be annoying, because they force you to modify the database in the correct order, but following the rules pays off. Without proper constraints, what would happen if you tried to delete from a lookup table a key referenced in other tables? Unfortunately, it would work, silently destroying the correctness of your data.
What would happen if you tried to sneak in a row that references a non-existing key? Again, it would bring in invalid data.

No CHECK constraints: Many columns have explicit or implicit constraints: failing to add them to the database schema means that values forbidden by the business rules will start to flow into the database. Some constraints are implicit, but equally important as the explicit ones. For instance:

an order should never be placed in a future date

a stock quantity should never be negative

a ZIP code should only contain numeric characters

a Social Security Number should be exactly 9 digits long

Relying on the application to validate data: If I had €0.01 for every time I found invalid data in a database and the developers said “the application will guarantee consistency”, I would be blogging from my castle in Mauritius. Maybe the application can guarantee consistency for the data that it manipulates (and it won’t, trust me), but it can do nothing for other applications using the same database. Often the database is a hub for many applications, each with its own degree of complexity and each with its level of quality. Pretending that all these applications will independently guarantee that no invalid data is brought in is totally unrealistic.

The last circle of SQL Server hell dedicated to Database Design sins is the circle of Inconsistent Baptists, those who fail to comply to sensible naming conventions. Stay tuned!

Scalar UDFs are dog-slow because the function gets invoked RBAR (Row-By-Agonizing-Row, for those that don’t know this “Modenism”). If the UDF performs data access, the statement in the scalar function gets invoked for each row, hitting performance badly.

Triggers are evil, according to Glenn Berry (blog|twitter), because they are a “bad” way to implement referential integrity. Moreover, from a performance standpoint, even if triggers work with sets instead of rows (unlike UDFs), they fire an additional query (or even more than one).

However, I seem to have found a way to merge the “business logic” query plan into the same execution plan of the external DML statement that modifies the data.

The method I will discuss here makes use of Indexed Views.

First of all, we will need some tables.

And now some business rules:

Users with a premium account can place orders with an unlimited total amount. Users with a normal account can place orders limited to a $1000 total amount.

Now that sample data is ready, let’s enforce the business rule #1: orders from users with a normal account must be limited to $1000.
To achieve this, we have to create an additional “dummy” table that holds exactly two rows. This table exists with the only purpose to implement a cartesian product and violate a UNIQUE constraint in the indexed view.

If we look at the execution plan of the INSERT statements, we can see that the indexed view maintenance is merged into the INSERT query plan:

It may be interesting to note that SQL Server is smart enough to identify the statements that require updating the indexed view. For instance, if we try to update a column that is not used in the indexed view, we won’t see any index maintenance in the query plan.
For instance, we could update order_id and examine the query plan:

-- This statement does not update the indexed view, so it is not included in the plan
UPDATE Orders SET order_id = 3 WHERE order_id = 2
GO

As you can see, there is no need to maintain the index on the view. To achieve the same with a trigger, you would have to explicitly define the behaviour of the code using IF UPDATE(ColumnName).
Moreover, the UNIQUE constraint gets evaluated whenever ANY table used in the indexed view gets modified: this would be very hard to achieve with a trigger.

Now that the first business rule is set, we can proceed with the second one: no ‘ADULT’ products can be ordered by minors.
This can get a tricky requirement, as we might be tempted to calculate the age of the user comparing it to GETDATE(). Unfortunately, non-deterministic functions cannot be used in indexed views. We will have to get around it by using the order_date column, that was set to GETDATE() previously.

Conclusion

Indexed Views provide an elegant way to enforce business rules that go beyond the scope of a single row in a table, without the kludge of CHECK constraints with scalar UDFs or the pain of DML triggers.
However, some limitations apply:

Not all queries can be expressed in a way that can be used in an Indexed View. You can’t use non-deterministic functions, common table expressions, subqueries or self joins.

Indexed Views cannot perform cross-database queries. If the business rule must be verified against a table stored in a different database, this method cannot be used.

Picking the right tool among CHECK constraints and triggers can be a hard decision. But now, hopefully, you have another option. 😉

P.S. : Mladen Prajdić (blog|twitter) kindly gave his blessing to the publishing of this post.