Question

I have 2 equal databases (A and B) with one table each running in separate offline machines.

Every day I export their data (as csv) and "merge" it into a 3rd database (C). I first process A, then B (I insert the content from A to C, then the contents from B to C)

Now, it could happen that I get duplicate rows. I consider a duplicate if some field, for example "mail" already exists. I don't care if the rest of the fields are the same.

How can I insert A and B into C excluding those rows that are duplicates?

Thanks in advance!

Was it helpful?

Solution

Easiest solution should be to create a unique index on the columns in question and run the second insert as INSERT IGNORE

OTHER TIPS

Personally I use the ON DUPLICATE KEY UPDATE as using INSERT IGNORE causes any errors to be thrown as warnings.

This may have some side effects and may result in behavior you may not expect. See this post for details on some of the side effects.

If you end up using the ON DUPLICATE KEY UPDATE syntax, it will also provide a means of changing your logic to update specific fields with new data should business requirements change.

For instance, you can tally how many times a duplicate record was inserted by saying ON DUPLICATE KEY UPDATE quantity = quantity+1.

The post referenced above has a ton more information.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top