NULL is the
absence of a value. This means it is an unknown value. But it is not an unknown
value like “x” is an unknown value in algebra. This is because “x” is an actual
value and so x = x. NULL is not only unknown; it is the absence of a value.
Consequently, NULL does not equal NULL; yet at the same time, any calculation involving NULL is
equal to NULL.
So…
2 + 2 + x is equal to 4 + x
But…
2 + 2 + NULL
is not equal to 4 + NULL
And T-SQL
functions align to this.
2
+ 2 = 4
2
+ 2 != 5
And
anything with a NULL in it is Not Equal.
All is well in the world so far.
Enter EXCEPT
which outputs distinct rows from a 1st dataset unmatched by any row
in a 2nd dataset. A simple example is below. 2 is in the top dataset
but not in the bottom set and so it is returned as a result of the EXCEPT clause.
Even though the top dataset has two 2s only one 2 is returned because the
EXCEPT clause inherently produces a distinct result.
We will
replace the 2s in the top dataset with NULLs. Notice that although T-SQL functions treat NULLs as not equal to each other, EXCEPT returns a single NULL. Both NULLs are taken to be "equal".
Finally, moving one of the NULLs to the bottom set, the result of the EXCEPT is empty – thereby treating NULL in the top set (which could be 2+2) as matching NULL in the bottom set (which could be 5).
The MSDN article goes so far as to describe the two NULL values as “equal” although the SQL standard uses the language “IS NOT DISTINCT FROM”.
So although T-SQL functions treat NULLs as not equal to each other, the T-SQL EXCEPT clause treats NULLs as equal. In the case below, each SELECT separately conforms to the logic that anything plus a missing value nets a missing value, but then treats those resulting missing values as matching. The result of the EXCEPT clause is empty.
Might 2 + 2 actually equal 5?
Happy trails....