Question

I need to find the rows from table t1 that have a unique (TRAN_ID,CMTE_ID) pair, where TRAN_ID and CMTE_ID are two of the columns. Then I'd like to insert these rows into the table uniques.

The problem is that the table uniques seems to end up containing duplicate pairs.

Note: table t1 was created using the InnoDb engine and then updated to use the MyISAM engine in order to speed up group by and join operations. t1 has 130 million rows.

Here's my create query:

DROP TABLE IF EXISTS uniques;
CREATE TABLE `uniques` (
   `CMTE_ID` varchar(9) DEFAULT '',
   `TRAN_ID` varchar(32) DEFAULT '',
   KEY `TRAN_INDEX` (`TRAN_ID`,`CMTE_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Then I run the query and insert into uniques:

LOCK TABLES  uniques write, t1 write;

INSERT INTO uniques
     SELECT TRAN_ID,CMTE_ID
           FROM t1
           GROUP BY TRAN_ID,CMTE_ID
           HAVING count(*) = 1;
UNLOCK TABLES;

At this point, I expect uniques to be populated with rows with unique (TRAN_ID,CMTE_ID) pairs. However, when I run

SELECT * FROM uniques 
    GROUP BY TRAN_ID,CMTE_ID 
    having count(*) > 1;

I still get a long list of rows. What's going on?

Was it helpful?

Solution

You might want to put a unique contraint on the pair to prevent uniques.

First guesses are operator error or the table already had data. Discounting those, there is another possibility. The types of the fields are:

`CMTE_ID` varchar(9) DEFAULT '',
`TRAN_ID` varchar(32) DEFAULT '',

Perhaps these are not big enough, so the data is actually being truncated when loaded into the table. This is just an idea. Your process seems sound.

EDIT:

Actually, I think the last is what is happening. Your insert query is equivalent to:

INSERT INTO uniques(CMTE_ID, TRAN_ID)
     SELECT TRAN_ID,CMTE_ID
     FROM t1
     GROUP BY TRAN_ID,CMTE_ID
     HAVING count(*) = 1;

Note that the column orders are different, so TRAN_ID is being loaded into CMTE_ID and vice versa. Because the types are different, the CMTE_ID is probably being truncated.

This is a good lesson in why you should always include column lists in insert statements.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top