Are NULLs Good or Bad (2013)

Probably the most discussed topic in this discussion group is whether the use of NULLs is a good thing, a bad thing or something in between. Every once in a while, someone asks a question and the answers spin off into an active discussion on the pros and cons of using NULL. There was such a question just two days ago, and it has spawned 38 replies ... so far (difference-between-key-and-unique-key-5113211).

The record I can find is in 2005, when the topic of NULLs generated 58 replies in 3 days (null-values-793481). Also in 2004 (difference-between-and-is-not-null-503539), 2006 (the-empty-string-and-nulls-1185748).

But this is a topic that bears repeating -- there are always new people coming on board and old people who have new opinions.

So -- let's give it another go. Picking up from the post on "Key and Unique Key", what do you think about NULL?

And I would like to be more clear about my contribution to the last thread. I'm not making a judgement about nulls. I am just pointing out that programmers are unqualified to make a judgement. Their desire is only to support their duties and not successfully and accurately reflect the real world in a database management system. It's one of the most entertaining things in IT, watching syntax jockeys architect a data structure.

Nulls are neither good nor bad. NULL is simply the absence of data. How one
deals with NULL is entirely dependent on the environment you are working
within. The various forms of VB and xBase platforms seem to be (in my
opinion) the worst at accommodating NULL within code. So, from that
perspective, Eileen is correct. However, from a database perspective
(excluding MS-Access) a column having a NULL state is actually useful
information (the absence of a value).

I find NULL to be a useful data state, in that this clearly indicates that
a value has either not been set, or specifically cleared.

The salient point here is that you can fix the coding error. If the table design had included a NOT NULL constraint, you could have been faced with a worse situation where 0 (or some other value) was used when the actual value was unknown, not relevant, etc., but also when the count, amount, etc. was actually zero.

NULL is neither good nor bad: it is essential. NULL is binary zero, ie. '#0000000000000000'.
What is good or bad is the way in which it is used, and interpreted.

NULL, as has been mentioned before, is a database device to represent 'nothing known', or 'no data available': there is no 'value' of whatever kind that can be used. Another way of looking at it is to consider the adoption of Zero in the arithmetic number line [–integers => +integers]. Zero was added as an afterthought to represent 'zero value' or 'zero quantity', ie. the absence of any value or quantity. The parallel with databases is, as a recent post remarked, NULL is 'the absence of data'.

Thus, no comparative operators can be used in order to obtain a sensible result, and so, nobody should really try – it just ends up with a meaningless nonsense.

Exists? clearly, if NULL is used in a column, no data exists. Thus a query using 'WHERE EXISTS' would return the empty set; unless, of course, the sub-query is searching for a 'IS NULL' – in which case, a set of those rows containing the 'something is NULL' would be returned.

The important aspect to remember about attributes (columns) which are Null, is that, by their very nature, they are non-critical and can sometimes even be trivial. Therefore, this leads to NULL not being able to be used in critical attributes, eg. a Primary Key cannot be Null, and neither can a Unique Key (Unique Index). As Foreign Keys refer to Primary keys of the related tables, they cannot be Null either.

Joins. If in a JOIN between tables there is a Null in a column in the WHERE clause, no data will be returned for the row concerned even though one of the tables may have valid non-null entries. A device employed, of course, is the LEFT JOIN whereby, if the right hand table of the join is Null, the data of the left hand row is still returned. Similarly for a RIGHT JOIN with the tables reversed: the left hand is Null but the right hand returns data. This device prevents the return of what would otherwise be incomplete sets.

The app programmer, as remarked before, is not and cannot be responsible for the design of the DB. However, it’s a poor one who does not recognise that some columns in a result set may be NULL. Take the simplest of examples (one I believe has been mentioned before) in the printing out of an address where the table has 5 fields + postcode. If one field returns 'Null' then the print output has to cater for not printing a blank row – not exactly rocket science.

There are undoubtedly many, many other similar situations where NULL has to be catered for in presenting the results to the user via the app. Shame upon a programmer who fails this test.

Whichever way you look at it, NULL is essential: it represents the 'absence of data'.

Brian brings out the most important aspect of NULL -- that it represents the 'absence of data' and that 'it is essential'. I'll not comment on the programmer themselves, as my job has always been to do both -- design/develop the database and write the software that works with it.

There are a couple of corrections, though:
* "NULL is binary zero." -- that is not the case, because if it were true, then you could not distinguish between "I don't know how much it costs" and "It's free". In most DBMS implementations, NULL is actually stored separately from the column value -- as a bit in a bit field appended to the row, for example. (And, in fact, the bit might equal 1, not zero, to indicate "is NULL").

* "by their very nature, they are non-critical and can sometimes even be trivial." The NULL-ness of an attribute may be critical and non-trivial, or not -- that's dependent on the context, and not in "their very nature". Future events are always null, for example, but that doesn't make them trivial -- someone's date of death is a simple example.

It's instructive to revisit the "history of nothing" -- that time in which neither the number nor the concept Zero formally existed -- to understand the inevitability and necessity of representing the absence of value. (My instruction started in philosophy, not in computing, so I learned programming, database and data analysis by reading Plato, Aristotle, St. Augustine, Kant, Hegel and Nietzsche, and some Asian philosophers as well). As data-oriented people, we should be comfortable with the notion of the absence of value -- designing and programming for that absence must be a natural skill.

Over the course of my time writing code (which continues to this day), the techniques of programming have moved to take the difficult parts away from everyday programmers. In my day, that meant using "structured programming", then "block based programming" (functions, procedures and modules), and later "object-oriented programming". Each of these was geared to hide the hard parts -- translating NULL to something more appropriate for the programmer / user would be one of the hard parts. So I don't think the use of NULL should present a programming problem, and I'm always surprised when it does. In a particular context, the meaning of NULL should have been well-defined long before code is being crafted, and a translation from "absence of value" to "semantic significance" should have been done early on -- not by the coder, but by the designer, the analyst or the customer. Once that is done, the programming for NULL is neither harder nor easier than programming for anything else.

When I studied Kung-Fu I read the Three Golden Rings which discussed that which does not exist. Null is that which does not exist and that which does not exist is very helpful in understanding that which does.

Nulls are ambiguous. Is gender unknown? Or not apply? Those two meanings of NULL are not the same. A null in End_Dt does not necessarily mean unknown. A NULL End_Dt could mean that the data on the row is still active in the same way that a flag with the value 'A' would.

In columns such as gender, and in general for codes, natures, types and indicators a 'U' for unknown and 'X' for not-apply removes ambiguity. In hundreds of DB2 tables with END_DT default '9999-12-31' I've never seen a problem and always seen performance improvements and less errors in results. In hundreds of DB2 tables with END_DT default NULL I've seen many problems in both performance and erroneous results. Sure, better SQL skills could avoid the errors. But even then the performance problems persist, especially when these columns are in indexes and in SQL predicates.

Bob Schmidt: are you suggesting that NULLs ought not to be used? That's the impression I get from your post -- that the use of NULL produces performance problems as well as erroneous results (unintended or misunderstood semantics) because of the "4-value logic" (true, false, unknown, and does not-apply).

Absent the NULL, what are the suitable universal indicators for the "unknown" and "not-apply" conditions? In the databases you've seen, has it been possible to say, without ambiguity and/or confusion:

or does the query-maker have to know what specific value is used for each column based on the defined datatype of the column and the convention adopted for this datatype in this table in this database in this dbms in this application in this division of this company?

I accept the NULL is an imperfect and ambiguous solution. Its saving grace is its universality -- the standardized syntax "IS NULL" is applicable regardless of datatype or dbms or anything else in the environment. I would certainly support improvement of the language -- introducing UNKNOWN, NOT_APPLICABLE and NULL (retaining the ambiguity it has today), for example.

Thanks Bryan for pointing out that NULL is not zero; it is not the same thing as a "null string" (an empty string); in fact it is not like anything. Zero = Zero, but Null != Null, and Not(Null = Null).

I have never seen any good option to living without Null in a data base, but it does complicate query and programming logic because it does not work with Boolean logic. Therefore, there are places where you can simplify everyone's life by using and agreed-upon substittute for Null. As Bob Schmidt points out, the use of 9999-12-31 for an end date makes it possible to write simple predicates using simply "x between start-date and end-date," rather than "x > start-date and (x < end-date or end-date is null)." When you have a lot of these, it gets painful to write as well as to read.

Bryan, In table definitions NULL is useful in numeric columns that will have column functions such as AVG. Because such numeric columns are updated frequently, they are often not in indexes anyway.

NULL is useful in columns such as DEATH_DT which is quite different in usage from END_DT. NULL is useful in columns of a denormalized table design where the column does not apply to every row. But even in these the removing the ambiguity between unknown and not-apply is useful.

Of course, NULL (NOT EXIST) is useful in results of subqueries and outerjoins.