JOIN vs. IN vs. EXISTS

https://stackoverflow.com/questions/18047961

23-06-2022
|

Pregunta

I was reading an article that explained the difference between join and in and exists clause but I got confused with the explanation of different results when using NOT IN vs. NOT EXISTS clause. Can someone clarify why there is a difference between the output for NOT EXISTS clause vs. NOT IN clause? I tried after deleting the NULL row (t2.id = 8) from the table t2 and still got the same result.

Here's the SQL script from the article:

CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO

INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5

INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'

And here's the SQL queries and their explanation:

-- IN doesn't get correct results. -- That's because of how IN treats NULLs and the Three-valued logic -- NULL is treated as an unknown, so if there's a null in the t2.t1id -- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE. -- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set.

SELECT    t1.* 
FROM    t1 
WHERE    t1.id NOT IN (SELECT t1id FROM t2)

-- NOT EXISTS gets correct results

SELECT    t1.* 
FROM    t1 
WHERE    NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
GO

DROP TABLE t2
DROP TABLE t1

Here's the link to the article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Thank you!

Solución

As I can see, you can use them as the same thing in a lot of cases, but you can't forget the details behind them.

Probably you can get the same results applying both NOT IN and NOT EXISTS, but you could see differences in query which involve the NULL value. Because NOT EXISTS is the only way to obtain those rows with the NULL value.

You can see it better in this example:

update cars set c_owner = NULL where c_id = BMW03444

Well... Let's try to see if we have any car in stock that has not been sold yet.

select count(*) from cars where c_owner not it (select c_name from customers);

Output:

COUNT(*): 0

Where's the failure? Quite simple. You're not requesting a group of cars whose buyers has not been included in the list. You are simply asking for a car without owner. Anybody, even if he's not in the list. The correct form is:

select count(*) 
from cars c1 
where not exists (
  select c_owner 
  from customers c2
  where c1.c_owner=c2.customer_id
);

COUNT(*): 1

This is because NOT IN needs specific values to check in. So NULL values are set as FALSE and not counted. NOT EXISTS checks the non existence of an element in a set, so NULL values are set as TRUE and are included.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow