How to delete duplicate rows in SQL

https://stackoverflow.com/questions/23633602

21-07-2023
|

Question

I have following data:

Id: 1       Name:   apple       ForeignKey: 10
Id: 2       Name:   apple       ForeignKey: 10
Id: 3       Name:   apple       ForeignKey: 15
Id: 4       Name:   peach       ForeignKey: 11
Id: 5       Name:   peach       ForeignKey: 12

Rows with same Name and ForeignKey are duplicates in my case. Now I want to remove all the duplicates except one instance from the table.

In other words; I want to remove all rows but one, where Name and ForeignKey are equal.

In case with upper data only the row with Id 2 OR Id 1 should be removed.

With

select count(Name), Name, ForeignKey group by Name, ForeignKey having count(Name)>1

I am able to find items where more than 1 row with same Name and Foreign key exist, but how to get the IDs of those rows? And how to get the IDs of those rows except the first/last occurrence of that row with same Name and ForeignKey?

Solution

The answer if database specific, but here is how you can do it joining the table to itself:

delete t1
from yourtable t1
    join yourtable t2 on t1.id > t2.id
        and t1.name = t2.name 
        and t1.foreignkey = t2.foreignkey

SQL Fiddle Demo

OTHER TIPS

You can also do it with a CTE & window function, deleting the duplicate rows by counting the number of rows that are the same, and then deleting all but one.

SQL Fiddle demo

;WITH myvals
AS (
    SELECT [id]
        ,[name]
        ,[foreignkey]
        ,ROW_NUMBER() OVER (
            PARTITION BY [name]
            ,[foreignkey] ORDER BY [id]
                ,[name]
                ,[foreignkey]
            ) AS inst_count
    FROM yourtable
    )
DELETE
FROM myvals
WHERE inst_count > 1;

delete x from ( select *, rn=row_number() over (partition by name,foreignkey order by name) from yourtable ) x where rn > 1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow