Question

I have data that i should insert into table. But before inserting i need to check against duplicate records and report list of those records.

Table:

CREATE TABLE `test` (
  `A` varchar(19) NOT NULL,
  `B` varchar(9) NOT NULL,
  KEY `A` (`A`),
  KEY `B` (`B`)
) ENGINE=InnoDB;

I need to check for both columns:

Number of records to insert: ~1000

Rows in table: ~1.000.000

What is the efficient way of doing this.

Thanks in advance.

Was it helpful?

Solution

This would depend on the table's layout.

Suppose you have the following table

CREATE TABLE `mydata` ( 
    `A` varchar(19) NOT NULL, 
    `B` varchar(9) NOT NULL, 
    KEY `A` (`A`), 
    KEY `B` (`B`) 
) ENGINE=InnoDB; 

Before you insert 1000 rows into mydata, you could do preload them into another table called mynewdata like this:

CREATE TABLE mynewdata LIKE mydata;
CREATE TABLE mynewdups LIKE mydata;
INSERT INTO mynewdata ... ;
INSERT INTO mynewdups SELECT * FROM mynewdata;

Next delete all rows in mynewdata that matches A or B in mydata

DELETE T1.* FROM mynewdata T1 INNER JOIN mydata T2 ON T1.A=T2.A OR T1.B=T2.B;

What's left in mydata are rows that do not have A or B matching

What about the rows that matched? Run this

DELETE T1.* FROM mynewdups T1 LEFT JOIN mydata T2
ON T1.A=T2.A OR T1.B=T2.B
WHERE T2.A IS NOT NULL;

What's left in mynewdata is data to import

What's left in mynewdups is data that had a dup key in mydata

Give it a Try !!!

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top