Reason for NOT IN behaviour [duplicate]
-
05-10-2020 - |
Pregunta
sql> SELECT * FROM runners;
+----+--------------+
| id | name |
+----+--------------+
| 1 | John Doe |
| 2 | Jane Doe |
| 3 | Alice Jones |
| 4 | Bobby Louis |
| 5 | Lisa Romero |
+----+--------------+
sql> SELECT * FROM races;
+----+----------------+-----------+
| id | event | winner_id |
+----+----------------+-----------+
| 1 | 100 meter dash | 2 |
| 2 | 500 meter dash | 3 |
| 3 | cross-country | 2 |
| 4 | triathalon | NULL |
+----+----------------+-----------+
What will be the result of the query below?
SELECT * FROM runners WHERE id NOT IN (SELECT winner_id FROM races);
The above query will return nothing because the races table contain null winner_id
. But what is the reason behind this?
Solución
SELECT * FROM runners WHERE id NOT IN (SELECT winner_id FROM races);
Is something like this (actual values from races
):
SELECT * FROM runners WHERE
id <> 2 and
id <> 3 and
id <> null;
id <> null
is unknown
by definition, so the whole where
clause evaluates to either false
(for ids: 2, 3) or unknown
(for ids: 1, 4, 5) because of the and
operators, thus no result returned.
If you want to run the query disregarding the nulls in races
, you can use:
SELECT * FROM runners
WHERE id NOT IN
(SELECT winner_id FROM races WHERE winner_id IS NOT NULL);
Otros consejos
I would advice not doing what Balazs is doing. This is what NOT EXISTS
is for, it handles null. Let's take a look at this..
Create sample data
CREATE TABLE foo AS
SELECT * FROM generate_series(1,100) AS t(x);
CREATE TABLE bar AS
SELECT * FROM generate_series(1,10) AS t(x)
UNION SELECT null;
Refresher of problem
Now, as per the earlier answer, and the docs,
"NOT IN is equivalent to <> ALL()` Note that if there are no failures but at least one right-hand row yields null for the operator's result, the result of the ALL construct will be null, not true. This is in accordance with SQL's normal rules for Boolean combinations of null values.
SELECT *
FROM foo
WHERE foo.x NOT IN ( SELECT x FROM bar );
If any of bar.x is NULL the whole condition is NULL and thus returns 0 rows.
Solutions
The best solution is to skip NOT IN
entirely if the set can be null and use a NOT EXISTS
test.
SELECT *
FROM foo
WHERE NOT EXISTS (
SELECT 1 FROM bar WHERE bar.x = foo.x
);
This will return 90 rows.
Alternatively this can be written as this..
SELECT *
FROM foo
LEFT OUTER JOIN bar
ON foo.x = bar.x
WHERE bar.x IS NULL;