Duplicate key error with PostgreSQL INSERT with subquery

https://stackoverflow.com/questions/23660311

22-07-2023
|

Question

There are some similar questions on StackOverflow, but they don't seem to exactly match my case. I am trying to bulk insert into a PostgreSQL table with composite unique constraints. I created a temporary table (temptable) without any constraints, and loaded the data (with possible some duplicate values) in it. So far, so good.

Now, I am trying to transfer the data to the actual table (realtable) with unique index. For this, I used an INSERT statement with a subquery:

INSERT INTO realtable 
SELECT * FROM temptable WHERE NOT EXISTS (
    SELECT 1 FROM realtable WHERE temptable.added_date = realtable.added_date
                              AND temptable.product_name = realtable.product_name
);

However, I am getting duplicate key errors:

ERROR: duplicate key value violates unique constraint "realtable_added_date_product_name_key"
SQL state: 23505
Detail: Key (added_date, product_name)=(20000103, TEST) already exists.

My question is, shouldn't the WHERE NOT EXISTS clause prevent this from happening? How can I fix it?

Solution

The NOT EXISTS clause only prevents rows from temptable conflicting with existing rows from realtable; it will not prevent multiple rows from temptable from conflicting with each other. This is because the SELECT is calculated once based on the initial state of realtable, not re-calculated after each row is inserted.

One solution would be to use a GROUP BY or DISTINCT ON in the SELECT query, to omit duplicates, e.g.

INSERT INTO realtable 
SELECT DISTINCT ON (added_date, product_name) * 
FROM temptable WHERE NOT EXISTS (
    SELECT 1 FROM realtable WHERE temptable.added_date = realtable.added_date
                              AND temptable.product_name = realtable.product_name
)
ORDER BY ???; -- this ORDER BY will determine which of a set of duplicates is kept by the DISTINCT ON

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow