Question

I have the following SQL script:

CREATE temporary table if not EXISTS the_values (
 key SERIAL,
 value INTEGER NULL 
);

insert into the_values(value) values (null),(1),(null),(2),(3),(4),(5),(6),(10),(null),(null);

select * 
from the_values 
where value not in (1,2,3,4,5,6,10); 

And I noticed that the query:

select * 
from the_values 
where value not in (1,2,3,4,5,6,10); 

Does not return the rows having value NULL, and that caught my attention. Therefore, I want to know why that happens. I am interested more about the technical aspect of this phenomenon rather that the obvious solution:

select * 
from the_values 
where value not in (1,2,3,4,5,6,10) 
   or value IS NULL; 
Était-ce utile?

La solution

If we simplify the insert and query to:

insert into T(x) values (null),(1);

select x from T where x not in (1);

For null the predicate will evaluate to:

select x from T where null not in (1) <=>
select x from T where not null in (1) <=>
select x from T where not null <=>
select x from T where null

so that row does not satisfy the predicate. If you try to compare something with null (think of it as unknown), the result will be null.

For 1 the predicate will evaluate to:

select x from T where 1 not in (1) <=>
select x from T where False

so that row does not satisfy the predicate either

I.e. you end up with an empty result

Autres conseils

Generally, avoid NOT IN when NULL values can be involved on either side. The Postgres Wiki suggests as much. And avoid NOT IN (SELECT ...) in any case. The Wiki has the explanation. And also happens to have the perfect answer to your core question:

  1. NOT IN behaves in unexpected ways if there is a null present:
select * from foo where col not in (1,null); -- always returns 0 rows
select * from foo where col not in (select x from bar);   -- returns 0 rows if any value of bar.x is null

This happens because col IN (1,null) returns TRUE if col=1, and NULL otherwise (i.e. it can never return FALSE). Since NOT (TRUE) is FALSE, but NOT (NULL) is still NULL, there is no way that NOT (col IN (1,null)) (which is the same thing as col NOT IN (1,null)) can return TRUE under any circumstances.

Here is a maybe not so obvious solution:

SELECT * 
FROM   the_values 
WHERE  value IN (1,2,3,4,5,6,10) IS NOT TRUE;

Or:

..
WHERE  value = ANY('{1,2,3,4,5,6,10}') IS NOT TRUE;

It's shorter and typically faster than your "obvious" one.

For long lists, consider switching to a different (faster) technique:

Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange
scroll top