Solutions for insertion of duplicate keys

https://stackoverflow.com/questions/1155895

18-09-2019
|

Question

NO MySQL answers please!

The basic query is as follows (assume A is Key)

INSERT INTO destination (A,B,C)
SELECT a1,b1,c1 
FROM source
WHERE (selectconditions) ;

Source contains many records that may or may not already be in destination, which means that the insert will fail as soon as a duplicate record is encountered.

Desired Behaviour: INSERT or IGNORE

This is the desired scenario for the given problem. Insert if you can, otherwise continue.

Pseudo c#/java:

foreach(record in selectQuery) 
{  
   try { destination.insert(record) } 
   catch(insertionException){//squelch} 
}

This can be handled in SQL by adding

AND NOT EXISTS (SELECT A FROM destination INNER JOIN source on destination.A = source.a1)

to the end of the query -- In other words, check before you insert.

What are some other alternatives to handling this common situation? What are the pros and cons of these techniques?

Solution

Some database provide an explicit syntax for operations that involve a conditional insert/update/ignore.

Oracle and SQLServer, for example have the MERGE statement which can insert/update/delete/ignore a record based on a set of predicates.

Ignoring database-specific syntax, you can perform the insert using a predicate that excludes records that already exist:

INSERT INTO target( A, B, C )
SELECT SA, SB, SB FROM source
WHERE NOT EXISTS (select A, B, C from TARGET where A = SA, B = SB, C = SC)

OTHER TIPS

If you share a common Primary Key:

INSERT INTO destination 
( A, B, C)
SELECT a1, b1, c1 FROM source
WHERE source.pk not in ( SELECT pk FROM destination );

If you don't:

INSERT INTO destination 
( A, B, C)
SELECT a1, b1, c1 FROM source
WHERE a1 + b1 + c1 not in ( SELECT a+b+c FROM destination );

I would probably do the following:

INSERT INTO Target (A, B, C)
SELECT
     S.A, S.B, S.C
FROM
     Source S
LEFT OUTER JOIN Target T ON
     T.A = S.A AND
     T.B = S.B AND
     T.C = S.C
WHERE
     T.A IS NULL

If you're using MySQL and have the luxury of forcing non-duplicate keys using a UNIQUE index, you can use INSERT ON DUPLICATE KEY UPDATE with an idempotent (no-op) update for duplicates.

INSERT INTO Target (A, B, C) (SELECT a1, b1, c1 FROM Source) ON DUPLICATE KEY UPDATE A=A;

This has the advantage of being very fast and not requiring an extra SELECT.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow