Efficient way to select all values from one column not in another column

https://stackoverflow.com/questions/8737093

14-04-2021
|

Pergunta

I need to return all values from colA that are not in colB from mytable. I am using:

SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)

It is working however the query is taking an excessively long time to complete.

Is there a more efficient way to do this?

Solução

In standard SQL there are no parentheses in DISTINCT colA. DISTINCT is not a function.

SELECT DISTINCT colA
FROM   mytable
WHERE  colA NOT IN (SELECT DISTINCT colB FROM mytable);

Added DISTINCT to the sub-select as well. If you have many duplicates it could speed up the query.

A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN as alternative to exclude the values in valB, and an alternative way to get distinct values with GROUP BY:

WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM   mytable m
LEFT   JOIN x ON x.colB = m.colA
WHERE  x.colB IS NULL
GROUP  BY m.colA;

Or, simplified further, and with a plain subquery (probably fastest):

SELECT DISTINCT m.colA
FROM   mytable m
LEFT   JOIN mytable x ON x.colB = m.colA
WHERE  x.colB IS NULL;

There are basically 4 techniques to exclude rows with keys present in another (or the same) table:

Select rows which are not present in other table

The deciding factor for speed will be indexes. You need to have indexes on colA and colB for this query to be fast.

Outras dicas

You can use exists:

select distinct
    colA
from
    mytable m1
where
    not exists (select 1 from mytable m2 where m2.colB = m1.colA)

exists does a semi-join to quickly match the values. not in completes the entire result set and then does an or on it. exists is typically faster for values in tables.

You can use the EXCEPT operator which effectively diffs two SELECT queries. EXCEPT DISTINCT will return only unique values. Oracle's MINUS operator is equivalent to EXCEPT DISTINCT.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow