Fix duplicate entries in table
-
14-03-2021 - |
Question
I just found out that a table on my production server (which holds approx. 35K records) contains 588 duplicate entries in an INT(11) column which has AUTO_INCREMENT. The UNIQUE key is missing on that column so that's probably the cause.
Any ideas on how to give all duplicate entries a unique ID and then adding the UNIQUE key to the column so this will never happen again?
Table schema:
CREATE TABLE `items` (
`item_ID` int(11) unsigned NOT NULL auto_increment,
`u_ID` int(10) NOT NULL default '0',
`user_ID` int(11) NOT NULL default '0',
`p_ID` tinyint(4) NOT NULL default '0',
`url` varchar(255) NOT NULL,
`used` int(10) unsigned NOT NULL,
`sort` tinyint(4) NOT NULL,
`last_checked` int(11) NOT NULL,
`unixtime` int(11) NOT NULL,
`switched` int(11) NOT NULL,
`active` tinyint(1) NOT NULL default '0',
UNIQUE KEY `unique` (`p_ID`,`url`),
KEY `index` (`u_ID`,`item_ID`,`sort`,`active`),
KEY `index2` (`u_ID`,`switched`,`active`),
KEY `item_ID` (`item_ID`),
KEY `p_ID` (`p_ID`),
KEY `u_ID` (`u_ID`)
) ENGINE=MyISAM AUTO_INCREMENT=42755 DEFAULT CHARSET=utf8
Solution
How about something like this? Again test it on a backup first.
# Copy duplicate records
CREATE TABLE newitem SELECT * FROM items WHERE item_ID IN
(SELECT item_ID FROM itemd GROUP BY item_ID HAVING COUNT(*) > 1);
# remove auto increment from id in new table
ALTER TABLE newitem DROP INDEX Item_ID, MODIFY item_ID int;
# delete duplicates from original
DELETE FROM item WHERE item_ID IN (SELECT DISTINCT item_ID FROM newitem);
#Update column to be primary key
ALTER TABLE items DROP INDEX Item_ID, ADD PRIMARY KEY (Item_ID);
# Set new duplicate ID's to null
UPDATE newitem SET item_ID=NULL;
# Insert records back into old table
INSERT INTO item SELECT * FROM newitem;
# Get rid of work table
DROP newitem;
OTHER TIPS
Since you already have a UNIQUE
key on the table, you could use this to make an UPDATE
statement that re-assigns unique ids to item_id
:
UPDATE
items AS it
JOIN
( SELECT
i.p_ID, i.url, @id:= @id+1 AS id
FROM
items AS i
CROSS JOIN
( SELECT @id:=0 ) AS dummy
ORDER BY
i.p_ID, i.url
) AS unq
ON
(unq.p_ID, unq.url) = (it.p_ID, it.url)
SET
it.item_id = unq.id ;
Then you can add a unique index on item_id
Interesting. You have an auto_increment without a Primary Key reference, just an index, this is why you have dupes in the first place. If you try to update and assign primary key (item_ID)
MySQL will complain because of the dupes in the item_ID column.
Your engine is MyISAM which means you don't have any FK constraints, so you could do a mysqldump of the table, truncate the table, update the schema, then re-import the data. Upon re-import MySQL should correctly insert all rows with truly unique Item_Ids.
I'll outline the steps here, but I strongly suggest you do this in a dev environment to confirm the steps work correctly, before applying to your production environment. I accept no responsibility for borked production data :)
$ mysqldump -u <user_name> -h <db_host> --opt <database_name> --single-transaction > backup.sql
mysql> truncate table `items`;
mysql> ALTER TABLE `items` DROP INDEX `Item_ID`, ADD PRIMARY KEY (`item_ID`), AUTO_INCREMENT = 1;
$ vi backup.sql # Remove the AUTO_INCREMENT reference from the Create Table syntax
$ mysql -h <host_name> <db_name> -u <username> -p < backup.sql
Give that a shot, these steps are un-tested but should set you down the right path.