需要查找重复的条目

https://dba.stackexchange.com/questions/13399

16-10-2019
|

题

我收到了一个数据库，其中有几百万个记录，但是这些数据库中可能有重复的记录。

用户将数据输入到数据库中，并生成一个主密钥，但是，如果用户再次输入相同的数据，即使以前已经输入了数据，也会为该数据生成新的主密钥。没有任何检查。

我需要去寻找这些重复项，但我真的不知道从哪里开始。我首先认为将所有单元格的串联除外，除了子查询中的主要键，然后对这些行进行计数，并查看哪些计数的计数高于1。

CFR。

pkey    recipe     fkey    comment
1   toast       3       tasty
2   curry       2       spicy
3   curry       2       spicy
4   bread       1           crumbly
5   orios       2       cookies

在这里，咖喱条目是相同的，我必须删除其中之一。

但是，我读到的串联在MySQL中是无法预测的，这对我来说也有点不错。

有提示吗？

解决方案

假设您的桌子被称为 ingredients. 。尝试以下操作：

步骤01）创建一个空的删除键表 ingredients_delete_keys

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;

步骤02）创建主键 ingredients_delete_keys

ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);

步骤03）索引 ingredients 带FK，食谱，PKEY的桌子

ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);

步骤04）填充 ingredients_delete_keys 桌子

INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;

步骤05）使用不匹配的键在成分表上执行删除联接

DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;

步骤06）放下删除键

DROP TABLE ingredients_delete_keys;

步骤07）摆脱 fk_recipe_pkey_ndx 指数

ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

好的，这是一个块中的所有行...

CREATE TABLE ingredients_delete_keys
SELECT fk,recipe,pkey FROM ingredients WHERE 1=2;
ALTER TABLE ingredients_delete_keys ADD PRIMARY KEY (fk,recipe,pkey);
ALTER TABLE ingredients ADD INDEX fk_recipe_pkey_ndx (fk,recipe,pkey);
INSERT INTO ingredients_delete_keys
SELECT fk,recipe,MIN(pkey)
FROM ingredients GROUP BY fk,recipe;
DELETE B.*
FROM ingredients_delete_keys A
LEFT JOIN ingredients B
USING (fk,recipe,pkey)
WHERE B.pkey IS NULL;
DROP TABLE ingredients_delete_keys;
ALTER TABLE ingredients DROP INDEX fk_recipe_pkey_ndx;

试试看！！！

警告

请注意，使用Min功能有助于将第一个PKEY输入FK。如果将其切换到最大函数，则保留了FK输入的最后一个PKEY。

许可以下： CC-BY-SA 和归因

不隶属于 dba.stackexchange