Question

I currently keep a CSV master file where I frequently update to manage a list of products.

If I try to import the CSV file directly, I get the error "duplicate key value violates unique constraint..." Currently, I update my Products Postgres table by deleting all the items in the table and import all the data back in again.

I realize this isn't a great way to do this. Is there a better way to go about this? I currently use pgAdmin III and also PG Commander clients.

Was it helpful?

Solution

You can do this by defining a trigger function which tries to update the existing records and only allows insertion to go ahead if none are found.

For this to work, you need to have a primary key or other criteria for uniquely identifying the rows, of course.

Suppose your table is defined like this:

CREATE TABLE TEST(
  id INT PRIMARY KEY, 
  name TEXT, 
  amount INT
);

The trigger function might look like this:

CREATE OR REPLACE FUNCTION test_insert_before_func()
RETURNS TRIGGER
 AS $BODY$
DECLARE
    exists INTEGER; 
BEGIN

    UPDATE test SET name=new.name, amount=new.amount
    WHERE id=new.id
    RETURNING id INTO exists;

    -- If the above was successful, it would return non-null
    -- in that case we return NULL so that the triggered INSERT
    -- does not proceed
    IF exists is not null THEN
        RETURN NULL;
    END IF;

    -- Otherwise, return the new record so that triggered INSERT
    -- goes ahead
    RETURN new;


END; 
$BODY$
LANGUAGE 'plpgsql' SECURITY DEFINER;

CREATE TRIGGER test_insert_before_trigger
   BEFORE INSERT
   ON test
   FOR EACH ROW
   EXECUTE PROCEDURE test_insert_before_func();

Now, if I insert a row which does not already exist, it is inserted:

test=> insert into test(id,name,amount) values (1,'Mary',100);
INSERT 0 1
test=> select * from test;
 id | name | amount
----+------+--------
  1 | Mary |    100
(1 row)

If I try to insert a row with the same ID:

test=> insert into test(id,name,amount) values (1,'Mary',200);
INSERT 0 0
test=> select * from test;
 id | name | amount
----+------+--------
  1 | Mary |    200
(1 row)

this time the row is updated instead of inserted.

It works just as well if I load the rows from a CSV file.

However: one thing you may not have considered: this will not delete any records that exist in the database and do not exist in the CSV file. If you wanted that to work you would need a more complex solution - perhaps a sequence like this:

  1. Loading the CSV file into a temporary table
  2. Deleting all rows from the real table that did not exist in the temp. table

    DELETE FROM test WHERE id NOT IN (SELECT id FROM temp);
    
  3. Then finally insert rows from the temp. table into the real table:

    INSERT INTO test(id,name,amount) (SELECT id,name,amount FROM temp);
    

This answer does not consider concurrency issues, in case the table might be updated by other users concurrently. However if you only ever load if from the CSV file then that is not likely to be an issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top