Question

I have a table that looks like this and I need to remove duplicates to get the most recent results. These are not your standard duplicates as there is no primary key or another column where you are counting same instances of the same value. This table has a list of registered players, with the date they joined and left a team. If column EndDate is Null then it means the player is still playing for that team.

PlayerID | RegID | RegDate | EndDate | Team | LastUpdate |

1 ---------| 1 ------| 10/12/13  | 10/16/13 | Red -- | 10/16/13 -- |

1 ---------| 2 ------| 10/17/13  | null ------ | Blue -- | 10/23/13 -- |

1 ---------| 3 ------| 10/17/13  | null ------ | Green -- | 10/23/13 -- |

What is a duplicate? If the player ID has more than 1 null record in the EndDate column. Then we would want to only retrieve the record with the null EnDate which was updated last on "LastUpdade" column, and if they have the same LastUpated value then take the highest value of RegistrationID column.

This should give us a result showing the following rows:

In this case then we would retrieve rows 1 and 3 since row 1 does not have a null in EndDate and row 3 since the LastUpdateON is the same as row 2 but its RegID is higher than that of row 2.

I have tried using a CTE and the Partition By command ordering by LastUpdated DESC and RegDate DESC but I am not getting the right results.

Could this be done using a CTE, if so how, or should it be done by creating another table, and if so how?

Thank you very much for any help you can provide me. Take care!

Était-ce utile?

La solution

You can do this with row_number():

select t.*
from (select t.*,
             row_number() over (partition by PlayerId, EndDate
                                order by lastupdate desc, registrationid desc
                               ) as seqnum
      from table t
     ) t
where EndDate is not NULL or seqnum = 1;

This enumerates the rows in a group. In this case, the group is defined by the PlayerId, EndDate combination. So, all the NULL values for a player are in one group. The first value is the one with the highest lastupdate date and then the highest registration id. The outer where takes all records that either have a valid EndDate or that are first in the group.

Your question is a little ambiguous on whether you just want to return these values or if you want to actually delete the others. Fortunately, SQL Server has updatable CTEs, so you can use very similar logic to delete the records from the table:

with todelete as (
      select t.*,
             row_number() over (partition by PlayerId, EndDate
                                order by lastupdate desc, registrationid desc
                               ) as seqnum
      from table t
     ) t
delete from todelete
    where EndDate is NULL and seqnum > 1;
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top