SQL Delete Duplicates with greater difference between two columns

https://stackoverflow.com/questions/22102179

18-10-2022
|

Вопрос

I have a table something similar to :

ID    Value1      Value2
122   800         1600
122   800         1800
133   700         1500
154   800         1800
133   700         1500
188   700         1400
176   900         1500

From this table I want to delete the duplicates (ID of 122 and 133) which have a greater difference between value2 and value1.

This means that where ID is 122 I want to keep the first row (1800-800>1600-800) This means that where ID is 133 I want to keep either one because they both have the same difference.

ID    Value1      Value2
122   800         1600
122   800         1800  <------delete this row
133   700         1500  <------delete either this row or the other identical row
154   800         1800
133   700         1500  <------delete either this row or the other identical row
188   700         1400
176   900         1500

It is on a much larger scale that this, so I cant just individually delete records.

Is there a way to write a statement that will delete all duplicates from my table where Value2 - Value1 is greater than Value2 - Value1 for its duplicate?

Решение

SQL Server has this great feature of updatable CTEs and subqueries. So, you can do this as:

with todelete as (
      select t.*,
             row_number() over (partition by id order by value2 - value1) as diff_seqnum
      from table t
     )
delete from todelete
    where diff_seqnum > 1;

That is, enumerate the rows for each id based on the difference in the two values. Then, only keep the rows where the sequence number is 1.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow