Question

I have a database table in the format:

DataA | DataB | DataC | TimeStamp | UniqueID

Data may look like:

5 | 4 | 11 | 1/1/2014 | 1
5 | 4 | 2  | 2/1/2014 | 2
5 | 4 | 11 | 3/1/2014 | 3
3 | 6 | 7  | 4/1/2014 | 4

The problem is that I have duplicate entries where DataA-C are all the same (rows 1 and 3), but the TimeStamp (date data was recorded) and UniqueID are always different. The way that I am recording data (I do not have the option to change recording procedures) always leaves open the possibility of recording the same data twice.

How can I run a query that compares all of the Data columns to check if there is a duplicate row and remove the entry with the latest data. For instance row 1 was recorded first so I would want to remove row 3 and keep row 1

Thank in advance for your help.

Here is an option I have tried:

Select Line 
         DataA
        , DataB
        , DataC
FROM [Database].[dbo].[tbl_Data]
Where Line = 5
Group 
    BY Line 
     DataA
    , DataB
    , DataC
Having COUNT(*) > 1
Était-ce utile?

La solution

DELETE FROM the_table
WHERE id IN (the select to return duplicated records' ids)

The select should be similar to this one

SELECT 
    id
FROM
   the_table T JOIN
(
    SELECT MIN(id) AS id FROM the_table 
    GROUP BY all your fields here
) sub ON T.id= SUB.id

Autres conseils

Get MAX id change like below:

Select MAX(ID)
FROM [Database].[dbo].[tbl_Data]
Where Line = 5
Group 
    BY Line 
     DataA
    , DataB
    , DataC
Having COUNT(*) > 1
DELETE FROM test 
WHERE UniqueId NOT IN (SELECT UniqueId FROM
  (SELECT *
   FROM test
   ORDER BY TimeStamp) T1
 GROUP BY DataA, DataB, DataC)

Fiddle

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top