Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

A registered person on system can be only a Student or Employee; Person entity requires uniqueness of social number (which every person has only one of, of course).

If one day the college decides that teachers (an Employee subtype) can also be Student, taking courses in their free time (it's very improbable but I can't think of anything else right now. It is possible!), it's much harder to change database design which could have thousands of entries, rather than just changing the logic in code that didn't allow a person be registered both as a student and an employee.

Why do we care about business rules in database design rather than in code?

Mostly we care about business rules being enforced and what is the best way for that.
–
ypercubeApr 12 '13 at 18:43

2

You're actually presenting a very bad example of what constraints are used for, since the flexibility of your entities, and the expandability of the database, are mostly defined by normalization. Having said that, constraints are the final safeguard against any corrupt data ever getting into the database, even if the application is bugged, even if a new application is developed, even if an external API is added, even if someone edits the DB directly. Constraints guard the database, on top of that the business logic will also have to do its own things before trying to access the DB.
–
Niels KeurentjesApr 13 '13 at 0:24

2

Actually, as a graduate student I'm considered both a Student, Employee, and a Teacher. So your example isn't really improbable.
–
Winston EwertApr 13 '13 at 1:15

1

You should never base a database design on the objects in your application. You would noramlly design this as person, then have a related table to deinfe the persons roles. Then the problem doesn;t come up as you havea realted table for the roles sso people can have multiple roles. If you want to have only one role person, then you constrain the table so that the peopleID is unique. When you want to change that remove teh constraint.
–
HLGEMApr 17 '13 at 19:49

11 Answers
11

Some constraints are best enforced in the database, and some are best enforced in the application.

Constraints that are best enforced in the database are usually there because they are fundamental to the structure of the data model, such as a foreign key contraint to ensure that a product has a valid category_id.

Constraints that are enforced in an application might not be fundamental to the data model, such as all FooBar products must be blue - but later someone might decide that FooBars can also be yellow. This is application logic that doesn't really need to be in the database, though you could create a separate colours table and the database can require that the product reference a valid entry from that table. BUT the decision that the only record in colours has the value blue would still come from somewhere outside the database.

Consider what would happen if you had no constraints in the database, and required them to all be enforced in the application. What would happen if you had more than one application that needed to work with the data? What would your data look like if the different applications decide to enforce contraints differently?

Your example shows a situation where it might have been more beneficial to have the constraint in the application rather than in the database, but perhaps there was a fundamental problem with the initial data model being too restrictive and inflexible?

+1 sometimes the best answers are those that have more questions.
–
JYeltonApr 12 '13 at 17:46

So according to this answer, The rule <a person can only exist in Student's sub-type table or only in Employees sub-type table> should be applied in code, And Database has <The Student/Employee sub-type must be a valid person> constraint. Am I right? (It was book's example). thanks.
–
loolooyyyyApr 12 '13 at 19:39

1

@loolooyyyy: Yes, I think that's correct. If the database enforces the first rule (that a person can only be a student or an employee) then the situation you described (in which an employee wants to register for a class) is impossible because: the person cannot be both, and it's not even possible to create a second "person" record because they can't share Social Security Numbers which are presumably issued from a third party (such as the government). Of course, this overly restrictive data model might work for some cases...
–
FrustratedWithFormsDesignerApr 12 '13 at 20:03

1

@loolooyyyy: Another way to use the original data model and still let teachers be students might be to have another table called teachers_as_students which is another subtype of Students and has a new foreign key refering to Teachers, and a system-generated primary key, instead of a Social Security Number. This way, a "student" actually is an alias for a teacher so the teacher can still register to take a class. It's hard to say for sure how well this would work without seeing the whole data model.
–
FrustratedWithFormsDesignerApr 12 '13 at 20:09

The data will likely long outlive the application code. If the rule is critical to the data being useful over time (like foreign key constraints that help keep the integrity of the data), it must be in the database. Otherwise you risk losing the constraint in a new application that hits the database. Not only do multiple applications hit databases (Including some that might not realize there is an important data rule) but some of them such as data imports or reporting applications may not be able to use the data layer set up in the main data entry application. An frankly, the chances of there being a bug in the constraint are much higher in application code in my experience.

In my personal opinion (based on over 30 years of dealing with data and experience with hundreds of different databases used for many different purposes) anyone who doesn't put the contraints in the database where they belong will eventually have poor data. Sometimes bad data to the point of being unusable. This is especially true where you have financial/regulatory data that needs to meet certain criteria for auditing.

Most referential integrity constraints that are implemented outside of the database can be defeated, so if you want your data to have guaranteed integrity at all times then you have to apply constraints in the database. Full stop, that's it.

Typically application-level constraints are defeated though the database read consistency mechanism, by which sessions cannot view other sessions' data until it is committed.

Therefore two session can intend to insert the same value into a column that is intended to be unique, can both check at the same time that the value does not already exist, insert the values, and commit. A unique constraint implemented in the database would not let this happen.

This helper validates that the attribute’s value is unique right before the object gets saved. It does not create a uniqueness constraint in the database, so it may happen that two different database connections create two records with the same value for a column that you intend to be unique. To avoid that, you must create a unique index in your database.

Simplicity - Declaring a constraint is significantly simpler than declaring a constraint and writing the code that will enforce that declaration.

Accuracy - Code you didn't write will never have a bug that you created. Database vendors spend time making sure their constraint code is accurate, so you don't have to.

Speed - Your application can never have more distributions than the database it is based on. Database vendors spend time making sure their constraint code is efficient, so you don't have to. The database itself also has faster access to the data than an application could ever have no matter how efficient.

Re-use - You may start with one application on one platform, but it may not stay that way. What if you need to access the data from a different OS, different hardware, or from a voice interface? By having constraints in the database this code never has to be re-written for the new platform and never has to be debugged for accuracy or profiled for speed.

Completeness - Applications enforce constraints when data is entered into the database and would require additional effort to verify older data is accurate or to manipulate data already in the database.

Short answer... to preserve data integrity (i.e. accuracy and validity).

An exception...
If the database is just storing a single-application's data for a single-user, such as in most Sqlite databases, it may not need constraints. In fact, they usually don't, so as to keep the access time so quick it's unmeasurable.

Editors mostly put data into the database and retrieve data one or a small number of records at a time. Their primary concerns are fast, accurate access to all the related pieces of data and fast, reliable storage of their changes.

Users mostly retrieve data and are most concerned with fast access to unquestionably accurate information. They often need various counts, aggregations and listings that used to be generated in those iconic foot-thick stacks of greenbar-paper printouts but usually wind up on web pages today.

Database development projects are almost always started at the behest of Users, but the design gets driven by the data-entry and record-at-a-time needs of Editors. As such, inexperienced developers often respond to the immediate need for speed (primarily, of development) by not putting constraints in the database.

If one and only one application is ever going to be used to make changes to the data for the entire life of the database, and that application is developed by one or a small number of well coordinated individuals, then it might be reasonable to rely on the application to insure data integrity.

However, as much as we pretend we can predict the future, we can't.

The effort to produce any database is too valuable to ever throw it away. Like a house, the database will be extended, altered, and renovated many times. Even when it is completely replaced, all the data will be migrated to the new database while preserving all of the old business rules and relationships.

Constraints implement those rules and relationships in a concise, declarative form in the database engine itself where they are easily accessed. Without them, subsequent developers would have to pour through the application programs to reverse-engineer those rules. Good Luck!

This, by-the-way, is exactly what mainframe COBOL programmers have to do as those massive databases were often created before we had relational engines and constraints. Even if migrated to a modern system like IBM's DB2, constraints sometimes aren't fully implemented since the logic of the old rules, embodied perhaps in a series of COBOL "batch" programs, may be so convoluted as to not be practical to convert. Automated tools can instead be used to convert the old COBOL into a newer version with interfaces to the new relational engine and with a little tweaking, data integrity is preserved... until a new app is written that subtly corrupts everything and the company is hauled into court for, say, foreclosing on thousands of home-owners they shouldn't have.

SSN is not necessarily unique. Heck, SSN is not even always known, and in some cases it doesn't exist (yet). SSNs can be reused and not all employees or students may ever have an SSN. This is peripheral to the question but demonstrates that, no matter where you enforce your constraints, you need to understand the data model and the domain pretty thoroughly to make decisions about business rules.

Personally I prefer the constraints to be as close to the data as possible. The very simple reason is that not everyone will use the application code to change data in the database. If you enforce your business rules at the application level and I go run an UPDATE statement directly against the database, how does your application prevent an invalid change? Another problem with business rules in the app is that recompiling / redeploying can be difficult, especially for distributed apps where it is possible that not everyone will get the update at the same time. And finally, changing the business rules in the application does absolutely nothing about data that already exists that violates the new rules - if you add the new constraint to the data, you need to fix the data.

You may be able to justify multiple, redundant checks at various levels. This all depends on the flexibility of the deployment methodologies, how likely a change is, and how difficult it is to synchronize a business rule change in the database and other layers. A compelling argument for repeating checks at the app layer is that you can potentially prevent a round-trip to the database only to fail a constraint there (depending on the nature of the constraint and whether it relies on existing data). But if I had to choose one or the other I'd put it in the database for the reasons above.

In the case that you mention explicitly, where you are suddenly allowing something that wasn't previously allowed, this isn't really a problem - you remove whatever constraint enforced it, regardless of where that exists. In the opposite case, where suddenly teachers are no longer allowed to be students, you potentially have a bunch of data to clean up, again regardless of where the constraint existed previously.

If/when you have a database where any given table can be updated by one or more applications or code paths then placing the appropriate constraints in the database means that your applications won't be duplicating the "same" constraint code. This benefits you by simplifying maintenance (reducing the number of places to change if/when there is a data model change) and ensures that the constraints are consistently applied regardless of the application updating the data.