Question

Anyone knows how to perform such query in Postgresql?

SELECT * 
FROM tabA 
WHERE NOT EXISTS (
    SELECT * 
    FROM tabB 
    WHERE tabB.id = tabA.id
)

When I execute such query, postgresql complains "ERROR: Greenplum Database does not yet support that query."

EDIT: And how about this one:

SELECT * 
FROM tabA 
WHERE NOT EXISTS (
    SELECT * 
    FROM tabB WHERE tabB.id = tabA.id AND tabB.id2 = tabA.id2
)

EDIT:
I tested in postgresql 8.2.15 for the 4 answers provided by @ypercube. Conclusions are:

1) The first does not work in this version of postgresql, as I said above in the question. The error message can be found there too.

2) For the other three answers, the execution speed is: (3)LEFT JOIN > (4)EXCEPT >> (2)NOT IN.
Specifically, for queries that have the same syntax, (3)LEFT JOIN takes about 5580ms, (4)EXCEPT takes about 13502ms, and (2)NOT IN takes more than 100000 (In fact I did not wait util it finished).
Is there any particular reasons for NOT IN clause to be so slow?
Cheng

Was it helpful?

Solution

There are 3 (main) ways to do this kind of query:

  1. NOT EXISTS correlated subquery

  2. NOT IN subquery

  3. LEFT JOIN with IS NULL check:

You found that the first way does work in Greenplum. @Marco and @juergen provided the 2nd way. Here's the 3rd one, it may bypass Greenplum's limitations:

SELECT tabA.* 
FROM 
    tabA 
  LEFT JOIN 
    tabB 
      ON  tabB.id = tabA.id 
      AND tabB.id2 = tabA.id2
WHERE tabB.id IS NULL ;

This (4th way) also works in Postgres (which supports EXCEPT operator):

SELECT a.*
FROM a
WHERE id IN
      ( SELECT id
        FROM a
      EXCEPT
        SELECT id
        FROM b
      ) ; 

Tested in SQL-Fiddle (that all 4 work in Postgres).

OTHER TIPS

The part of the error you left out could have pointed you in the right direction. I think it said "DETAIL: The query contains a correlated subquery." So you have to rewrite these with joins or non-correlated subqueries.

SELECT * FROM tabA WHERE id NOT IN (SELECT id FROM tabB);

As for the second query, try

SELECT * FROM tabA WHERE (id, id2) NOT IN (SELECT id, id2 FROM tabB);
SELECT * FROM tabA 
WHERE id not in  (SELECT id FROM tabB)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top