As mentioned in a previous chapter of this wikibook and in wikipedia sometimes there is no value in a column of a row, or - to say it the other way round - the column stores the NULL marker (a flag to indicate the absence of any data), or - to use the notion of the SQL standard - the column stores the NULL value. This NULL marker is very different from the numeric value zero or a string with a length of zero characters! Typically it occurs when an application yet hasn't stored anything in the column of this row.

(A hint to Oracle users: For Oracle the NULL marker is identical to a string of zero characters.)

The existence of the NULL marker introduces a new fundamental problem. In the usual boolean logic there are the two logical values TRUE and FALSE. Every comparison evaluates to one of the two - and the comparisons negation evaluates to the opposite one. If a comparison evaluates to TRUE, its negation evaluates to FALSE and vice versa. As an example, in the usual boolean logic one of the following two comparisons is TRUE and the other one is FALSE: 'x < 5', 'x >= 5'.

Imagine now the new situation that x holds the NULL marker. It is not feasible that 'NULL < 5' is true (1). But can we say 'NULL < 5' is false (2) and its negation 'NULL >= 5' is true (3)? Is (3) more feasible than (1)? Of course not. (1) and (3) have the same 'degree of truth', so they shall evaluate to the same value. And this value must be different from TRUE and FALSE.

Therefore the usual boolean logic is extended by a third logic value. It is named UNKNOWN. All comparisons to the NULL marker results per definition in this new value. And the well known statement 'if a statement is true, its negation is false' gets lost because there is a third option.

This page proceeds in two stages: First it explains the handling of NULLs concerning comparisons, grouping, etc. . Second it explains the boolean logic for the cases where the new value UNKNOWN interacts with any other boolean value - including itself.

SQL knows the six comparison predicates <, <=, =, >=, > and <> (unequal). Their main purpose is the arithmetic comparison of numeric values. Each of them needs two variables or constants (infix notation). This implies that it is possible that one or even both operants hold the NULL marker. As stated before the common and very simple rule is: "All comparisons to the NULL marker results per definition in this new value (unknown).". Here are some examples:

col_1 = col_2 evaluates to UNKNOWN for rows where col_1 or col_2 holds the NULL marker.

NULL = NULL evaluates to UNKNOWN.

col_1 = col_2 evaluates to UNKNOWN for rows where col_1 and col_2 holds the NULL marker.

The WHERE clause returns such rows where it evaluates to TRUE. It does not return rows where it evaluates to FALSE or to UNKNOWN. In consequence it is not guaranteed that the following SELECT will return the complete table t1:

-- This SELECT will not return such rows where col_1 holds the NULL marker.SELECT*FROMt1WHEREcol_1>5ORcol_1=5ORcol_1<5;

Of course there are use cases where rows with the NULL marker must be retrieved. Because the arithmetic comparisons are not able to do so, another language construct must do the job. It is the IS NULL predicate.

-- This SELECT will return exactly these rows where col_1 holds the NULL marker.SELECT*FROMt1WHEREcol_1ISNULL;

For the other predicates there is no simple rule of thumb. They must be explained one after the other.

The IN predicate is a shortcut for a sequence of OR operations:

Only the two comparisons 'col_1 = 3' and 'col_1 = 18' are able to retrieve rows (possibly many rows). The comparison 'col_1 = NULL' will never evaluate to TRUE. It's always UNKNOWN, even if col_1 holds the NULL marker. To retrieve those rows it's necessary - as shown above - to use the 'IS NULL' predicate.

-- Shortcut for: col_1 = 3 OR col_1 = 18 OR col_1 = NULLSELECT*FROMt1WHEREcol_1IN(3,18,NULL);-- the NULL case will never hit with the IN predicate!

This is a little more complex. This will only return 1, 3, and 4, the items that don't have NULL in t2.col_x or t1.col_1.

SELECT*FROMt1WHEREcol_1IN(SELECTcol_xFROMt2WHEREid<10);

The subselect of an EXISTS predicate evaluates to TRUE if the cardinality of the retrieved rows is greater than 0, and to FALSE if the cardinality is 0. It is not possible that the UNKNOWN value occurs.

-- The subselect to t2 can hit some rows - or not. If there are hits in the subselect, ALL rows of t1-- are returned, else no rows of t1 are returned.SELECT*-- The select to table t1FROMt1WHEREEXISTS(SELECT*FROMt2WHEREid<10);-- The subselect to table t2

The LIKE predicate compares a column with a regular expression. If the column contains the NULL marker, the LIKE predicate returns the UNKNOWN value, what means that the row is not retrieved.

-- The LIKE retrieves only rows where col_2 matches the WHERE statement and col_2 is not NULL.SELECT*FROMt1WHEREcol_2LIKE'Hello %';

The aggregate functionsCOUNT(<column_name>), MIN(<column_name>), MAX(<column_name>), SUM(<column_name>) and AVG(<column_name>) ignores such rows where <column_name> contains the NULL marker. On the other hand COUNT(*) includes all rows.

If a parameter of one of the scalar functions like UPPER(), TRIM(), CONCAT(), ABS(), SQRT(), ... contains the NULL marker the resulting value is - in the most cases - the NULL marker.

There are some situations where column values are compared to each other to answer the question, whether they are distinct. For usual numbers and strings the result of such decisions is obvious. But how shall the DBMS handle NULL markers? Are they distinct from each other, are they equal to each other or is there no answer to this question at all? To get results, which are expected by (nearly) every end user, the standard defines "Two null values are not distinct.", they build a single group.

SELECT DISTINCT col_1 FROM t1; retrieves one and only row for all rows where col_1 holds the NULL marker.

... GROUP BY col_1 ...; builds one and only one group for all rows where col_1 holds the NULL marker.

The next operation is the negation of the new value. Which values evaluate to 'NOT UNKNOWN'? The UNKNOWN value represents the impossibility to decide between TRUE and FALSE. It is not feasible that the negation of this impossibility leads to TRUE or FALSE. Likewise it is UNKNOWN.

-- Which rows will match? (1)SELECT*FROMt1WHERENOTcol_2=NULL;-- 'col_2 = NULL' evaluates to UNKNOWN in all cases, see above.-- Is this SELECT equivalent to the first one? (2)SELECT*FROMt1EXCEPTSELECT*FROMt1WHEREcol_2=NULL;-- No, it's different!! Independent from NULL markers in col_2, (1) retrieves-- absolutely NO row and (2) retrieves ALL rows.

The above SELECT (1) will retrieve no rows as 'NOT col_2 = NULL' evaluates to the same as 'col_2 = NULL', namely UNKNOWN. And the SELECT (2) will retrieve all rows, as the part after EXCEPT will retrieve no rows, hence only the part before EXCEPT is relevant.

In the abstract syntax of logical systems p shall represent any of its truth values and NOT p its negation. Herein the following table applies: