Efficient way to select all values from one column not in another column
-
14-04-2021 - |
Pergunta
I need to return all values from colA
that are not in colB
from mytable
. I am using:
SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)
It is working however the query is taking an excessively long time to complete.
Is there a more efficient way to do this?
Solução
In standard SQL there are no parentheses in DISTINCT colA
. DISTINCT
is not a function.
SELECT DISTINCT colA
FROM mytable
WHERE colA NOT IN (SELECT DISTINCT colB FROM mytable);
Added DISTINCT
to the sub-select as well. If you have many duplicates it could speed up the query.
A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN
as alternative to exclude the values in valB
, and an alternative way to get distinct values with GROUP BY
:
WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM mytable m
LEFT JOIN x ON x.colB = m.colA
WHERE x.colB IS NULL
GROUP BY m.colA;
Or, simplified further, and with a plain subquery (probably fastest):
SELECT DISTINCT m.colA
FROM mytable m
LEFT JOIN mytable x ON x.colB = m.colA
WHERE x.colB IS NULL;
There are basically 4 techniques to exclude rows with keys present in another (or the same) table:
The deciding factor for speed will be indexes. You need to have indexes on colA
and colB
for this query to be fast.
Outras dicas
You can use exists
:
select distinct
colA
from
mytable m1
where
not exists (select 1 from mytable m2 where m2.colB = m1.colA)
exists
does a semi-join to quickly match the values. not in
completes the entire result set and then does an or
on it. exists
is typically faster for values in tables.