Question

I have an sql server database, that I pre-loaded with a ton of rows of data.

Unfortunately, there is no primary key in the database, and there is now duplicate information in the table. I'm not concerned about there not being a primary key, but i am concerned about there being duplicates in the database...

Any thoughts? (Forgive me for being an sql server newb)

Was it helpful?

Solution

Well, this is one reason why you should have a primary key on the table. What version of SQL Server? For SQL Server 2005 and above:

;WITH r AS
(
    SELECT col1, col2, col3, -- whatever columns make a "unique" row
    rn = ROW_NUMBER() OVER (PARTITION BY col1, col2, col3 ORDER BY col1)
    FROM dbo.SomeTable
)
DELETE r WHERE rn > 1;

Then, so you don't have to do this again tomorrow, and the next day, and the day after that, declare a primary key on the table.

OTHER TIPS

take a look at this.

"it is not hard to delete data that is duplicated across all columns of a table. What is harder to do is to delete data that you consider duplicate based on your business rules while SQL Server considers it unique data"

http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx

Let's say your table is unique by COL1 and COL2.
Here is a way to do it:

SELECT *
FROM (SELECT COL1, COL2, ROW_NUMBER() OVER (PARTITION BY COL1, COL2 ORDER BY COL1, COL2 ASC) AS ROWID
      FROM TABLE_NAME )T
WHERE T.ROWID > 1

The ROWID > 1 will enable you to select only the duplicated rows.

This article about Deleting duplicate records in SQL Server for table without primary key can help u solving the problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top