Not sure how I see how using Nulls violates data integrity. If your point is that if 0 is an appropriate value when there is not another valid data choice, then yes, the field should be set to NOT NULL with a default value of 0. That is not always the case. I don't need to reiterate the several discussions that have preceeded this, other than to say that the use of NULLS does not automatically violate data integrity.

Coincidentally I started an internal article for my company, the same day this was posted, on Referential Integrity (RFI).

Some other considerations for a lack of RFI are poor database designs including the shared use of one column to represent many columns. I have witnessed numerous instances of a column we will call "someEntityID" which implicitly references other Entity IDs in other tables. The problem is that there is no actual RFI. The value in the someEntityID column could represent one of a dozen other primary keys in other tables. There are no foreign keys to establish the relationship. It is all implicit and merely coincidental if a value does exist in Table X and references a value in another table.

Besides the RFI issue it is of poor DB design. One should be able to look at a column and know what it represents. One should not have to guess what the column represents or perform klugey LEFT OUTER JOINS or create ad hoc mapping tables. Yes, you could create an additional column to provide context to the column to identify what that particular instance of a column means per record but why? It is non-intuitive and more importantly breaks RFI. It also does not support Codd's 12 rules for a relational database; specifically Rule 0 and Rule 10.

Even a simple definition via Google clarifies what RFI really is. All RFI is really doing is making sure the relationships exist between tables to maintain data consistency. "Each non-null value of a foreign key must match the value of some primary key." - Information Modeling and Relational Databases; Halipin

So being NULL itself is not the issue but I see his point that come columns that are NULL may be designed poorly and should be NOT NULL. Wouldn't this only be an RFI issue if the application layer was maintaining RFI and not the DB? If one was using the App layer to do this then I could see how data could be inconsistent and the DB would be subject to mistakes made by the application code. This usually results in duplicate records or orphaned records.

An excerpt from an article I am writing for developers in my company: "There is no one absolute best answer when asking how to implement RFI whereas scenarios will dictate the requirements but there are two primary schools of thought for most databases. The first, and by the way the best practice, is to maintain RI within the database. After all, SQL Server is a relational database management system (RDBMS). Why not let it do what it was designed to do? The second school of thought is to allow software code, within an application, to do this. This quite frankly may have its benefits in limited implementations, such as cross system integrity, but it is fraught with peril and relies on very attentive programmers and acute domain knowledge of the software and database by programmers. It is important to note that the software implementation of referential integrity does not follow Edgard F. Codd's 12 Rules for a relational database listed above. A third option is to mix these two solutions."

It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.

I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.

Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.Is that ...March 11, 2009 ?Nov 3, 2009 ?Nov 9, 2003 ?Sep 11, 2003 ?

One item that was touched on briefly was the downstream reuse of the data for "other" applications. In this case, it is important for a developer to not only enforce constraints for the initial data but also to realize how the data could be used in a data warehouse environment. This becomes even more relevent for those that are dealing with "spatial" datasets. For example; in a recent application and data cleansing assignment that I was working on, spatial data was being entered by end users who had no knowledge or capability to check to see if the data they were entering was actually referencing a valid "roadway network" location. Knowing that the data collected would eventually be used in a GIS environment, it was paramount that the data collection process be spatially constrained as well as domain constrained. This was not an easy task to perform in a spatially "un-aware" database, so considerable data validation had to be included the front end application. Not doing so could result in erroneous location information needing to be "reworked" at a later time and locations that cannot be referenced against a valid datasource.

Moral of the story.....constrain and validate your data input and never expect the end users to do anything more than "key-in".

Steve Jones - Editor (8/30/2010)It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.

I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.

So, Steve, what is the definition of "N/A"? Is it "Not Applicable" or "Not Available" or "No Answer"?

Ralph D. Wilson IIDevelopment DBA

"Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."A. Lincoln