Thursday, February 2, 2017

NULLs and the T-SQL EXCEPT Clause

NULL is the absence of a value. This means it is an unknown value. But it is not an unknown value like “x” is an unknown value in algebra. This is because “x” is an actual value and so x = x. NULL is not only unknown; it is the absence of a value. Consequently, NULL does not equal NULL; yet at the same time, any calculation involving NULL is equal to NULL.

So…
2 + 2 + x is equal to 4 + x

But…
2 + 2 + NULL is not equal to 4 + NULL

And T-SQL functions align to this.
2 + 2 = 4
2 + 2 != 5
And anything with a NULL in it is Not Equal. 


All is well in the world so far.


Enter EXCEPT which outputs distinct rows from a 1st dataset unmatched by any row in a 2nd dataset. A simple example is below. 2 is in the top dataset but not in the bottom set and so it is returned as a result of the EXCEPT clause. Even though the top dataset has two 2s only one 2 is returned because the EXCEPT clause inherently produces a distinct result.


We will replace the 2s in the top dataset with NULLs. Notice that although T-SQL functions treat NULLs as not equal to each other, EXCEPT returns a single NULL. Both NULLs are taken to be "equal". 



This conforms to the general functionality of DISTINCT.



Finally, moving one of the NULLs to the bottom set, the result of the EXCEPT is empty – thereby treating NULL in the top set (which could be 2+2) as matching NULL in the bottom set (which could be 5).


The MSDN article goes so far as to describe the two NULL values as “equal” although the SQL standard uses the language “IS NOT DISTINCT FROM”.


So although T-SQL functions treat NULLs as not equal to each other, the T-SQL EXCEPT clause treats NULLs as equal. In the case below, each SELECT separately conforms to the logic that anything plus a missing value nets a missing value, but then treats those resulting missing values as matching. The result of the EXCEPT clause is empty.



Might 2 + 2 actually equal 5?

Happy trails....