Question

How to entries that have same entries? My table player have two aspects(name, wowp_id) that duplicate. How to merge them?

I have been looking for related questions. The code below is build on the answers that I found. Process runs fine but duplicates remain. I would like to have no duplicate names. If there are different wowp_id for multiple name I would prefer to remove wowp_id and keep only one entry of name.

def sql_removeduplicates():
    con = sqlite3.connect('WOWT.sql')
    with con:    
        cur = con.cursor()    
        cur.execute("SELECT name, COUNT(*) FROM players GROUP BY name, team, wowp_id HAVING COUNT(*) > 1")
        rows = cur.fetchall()
        con.commit()
        for row in rows:
            print row

my rows in players:

  (id, wowp_id, name, team)
(108, 501078041, u'prazluges', None)
(109, 507894244, u'Aidis', None)
(110, 500742127, u'Aidis', None)
(111, u'Aidis', u'Aidis', None)
(112, u'Aidis', u'Aidis', None)
(113, 500864543, u'prazluges', None)
(114, u'Aidis', u'Aidis', None)
(115, u'Aidis', u'Aidis', None)
(116, u'Aidis', u'Aidis', None)
(117, 501078041, u'satih', None)
(118, u'Aidis', u'Aidis', None)
Was it helpful?

Solution 2

You can define a unique index on the column, or define a unique constraint on the table using indexed columns.

CREATE UNIQUE INDEX IF NOT EXISTS ...

CREATE TABLE IF NOT EXISTS ... (..., UNIQUE(col1, col2, col3), ...)

These help prevent repetition before it arises. Links are to the SQLite documentation.

OTHER TIPS

You can do

DELETE FROM players
 WHERE id NOT IN
(
  SELECT MIN(id) id
    FROM players
   GROUP BY wowp_id, name
);

Note: before proceeding with DELETE make sure that you have a solid backup of your data.

After deleting duplicates from your table make sure to create a UNIQUE constraint

CREATE UNIQUE INDEX idx_wowp_id_name ON players(wowp_id, name);

Outcome after deduping:

|  id |   wowp_id |      name | team |
|-----|-----------|-----------|------|
| 108 | 501078041 | prazluges | None |
| 109 | 507894244 |     Aidis | None |
| 110 | 500742127 |     Aidis | None |
| 111 |     Aidis |     Aidis | None |
| 113 | 500864543 | prazluges | None |
| 117 | 501078041 |     satih | None |

Here is SQLFiddle demo

Import your results into a set of a custom class with the contains method defined to keep what you want. Example follows:

class players:
   def __contains__(self, item):
        return self.playersObj.name != item.name
   # Your other methods go here

Then import your rows into an instance of players and write them back out.

Sounds like you are trying to remove duplicated WOWP_ID for the same NAME. I assume you are keeping the largest WOWP_ID for each NAME. If you have a reliable unique key such as a primary key in your table, the answer is quite simple. If you don't have such a key, you may try something like this:

import unittest
import sqlite3

class DaoTest(unittest.TestCase):                
    def testDeleteDuplicates(self):
        with sqlite3.connect("WOWT.sql") as conn:
            rowsToDelete = conn.execute('''
                SELECT PLAYERS.NAME, PLAYERS.TEAM, PLAYERS.WOWP_ID FROM PLAYERS INNER JOIN 
                (
                    SELECT PLAYERS.NAME, MAX(WOWP_ID) AS MAX_ID FROM PLAYERS INNER JOIN
                    (
                        SELECT NAME, COUNT(DISTINCT WOWP_ID) AS DUP FROM PLAYERS
                        GROUP BY NAME
                        HAVING DUP > 1
                    ) DUPTABLE
                    ON PLAYERS.NAME = DUPTABLE.NAME
                    GROUP BY PLAYERS.NAME
                ) RowsToKeep
                ON PLAYERS.NAME = RowsToKeep.NAME AND PLAYERS.WOWP_ID <> MAX_ID
            ''')
            conn.executemany("DELETE FROM PLAYERS WHERE NAME = ? AND TEAM = ? AND WOWP_ID = ?", rowsToDelete)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top