SQL delete duplicate rows when matching on more than one column

https://stackoverflow.com/questions/23123921

sql-server-2005

05-07-2023
|

Question

I have to clean up an old table that never had a primary or foreign keys on it with many hundreds of rows of duplicate data. I've seen plenty of examples on how to delete from a table when using just one column but I don't understand how to extend the examples to include the two possibly three necessary columns.

The table data basically looks like this:

 Id     Person    Date
 1      12        3/12/2014
 1      12        3/12/2014

I thought the following seemed like a good way to achieve my goal but its not returning any results. How can I most effectively achieve this? I don't want to have to recreate the table if I can help it.

 WITH cte AS (
     SELECT Id, Person, Date,
     row_number() OVER(PARTITION BY Id,Person,Date ORDER BY Id) AS rn
 FROM dbo.mytable
 )
 DELETE cte WHERE rn > 1

Solution

You should order by and partition by the same list of columns. This way the row number will restart for all unique combinations.

WITH cte AS (
     SELECT Id, Person, Date,
     row_number() OVER(PARTITION BY Id,Person,Date ORDER BY Id,Person,Date) AS rn
 FROM dbo.mytable
 )
 DELETE cte WHERE rn > 1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow