Question

This has probably been asked before, but googling for keywords like "IN" doesn't work very well.

This is my query:

UPDATE tblCustomer SET type = 2 
WHERE idcustomer                                
IN (SELECT fidcustomer1
    FROM tblorder                   
     UNION                      
    SELECT fidcustomer2
    FROM tblorder                   
   )                                

To break it down: I want to set the type (just an int) of all customers to 2 for all customers that appear in the order-table, in one of either column.

On my test data, none of these tables contain more than a few hundred rows, yet the query runs for many minutes (even without the UNION, that doesn't seem to make much of a difference), apparently re-doing the inner query once per row in customer. I could obviously rewrite it into a single SELECT DISTINCT(id), followed by a few hundred single-row updates and do the logic in whatever language I use for my ODBC-access, but that's just a hack.

How can I rewrite this properly?

Addendum: The table I want to update contains a lot of relatively large BYTEA blobs, a few MB per row. They are set to Storage External or Extended, but I wonder if that could make the sequential scans slow. All updates seem to take a long time, not just this one.

Was it helpful?

Solution 2

-------------------------------
-- Use two EXISTS:
-------------------------------
UPDATE tblCustomer tc
SET type = 2
WHERE EXISTS (
    SELECT *
    FROM tblorder ex
    WHERE ex.fidcustomer1 = tc.idcustomer
    )
OR EXISTS (
    SELECT *
    FROM tblorder ex
    WHERE ex.fidcustomer2 = tc.idcustomer
    );

-------------------------------
-- or combine the two EXISTS::
-------------------------------
UPDATE tblCustomer tc
SET type = 2 
WHERE EXISTS (
    SELECT *
    FROM tblorder ex
    WHERE ex.fidcustomer1 = tc.idcustomer
    OR ex.fidcustomer2 = tc.idcustomer
    );

My gut feeling is that the first version (with two separate exists) will perform better, because the executor could short-circuit if one of the existss would yield True. That would avoid the removal-of-duplicates phase (and probably sorting), which is inherent to the UNION construct.

OTHER TIPS

I suggest a simpler approach:

UPDATE tblCustomer c
SET    type = 2 
FROM   tblorder o
WHERE  c.idcustomer IN (o.fidcustomer1, o.fidcustomer2)
AND    c.type IS DISTINCT FROM 2  -- optional, to avoid empty updates

Except, if there are duplicates in tblorder, then a similar approach to what you had is sensible:

UPDATE tblCustomer c
SET    type = 2 
FROM  (
    SELECT fidcustomer1 AS cust FROM tblorder
    UNION
    SELECT fidcustomer2 FROM tblorder
    ) o
WHERE  c.idcustomer = o.cust
AND    c.type IS DISTINCT FROM 2;

Either way, in PostgreSQL, joining in a table regularly performs better than an IN expression.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top