Question

CREATE join_table {
  id1 integer,
  id2 integer
}

I want to create a UNIQ CONSTRAINT(id1, id2), however, I am seeing some bad data such as this:

id1   | id2
------------
1     | 1
1     | 1
1     | 2

So, the record (1,1) is clearly a duplicate, and will violate uniq constraint. How do I write a sql query which will remove all duplicate records from the table.

Note: I want to delete one of the duplicates so that I can create the uniq constraint

Was it helpful?

Solution

This will keep one of the duplicates:

delete from join_table
where ctid not in (select min(ctid)
                   from join_table
                   group by id1, id2);

Your table doesn't have a unique identifier that could be used to "pick one survivor". That's where Postgres' ctid comes in handy, as it is an internal unique identifier for each row. Note that you should never use the ctid for more than just a single statement. It is not a universally unique things but for the runtime of a single statement it's just fine.

SQLFiddle example: http://sqlfiddle.com/#!15/dabfc/1

If you want to get rid of all rows that are duplicated:

delete from join_table
where (id1, id2) in (select id1, id2
                     from join_table
                     group by id1, id2
                     having count(*) > 1);

Neither solution will be fast on a large table. Creating a new table without duplicates as jjanes has shown will be much faster if you need a substantial number of rows from a large table.

OTHER TIPS

Without a primary key, that will be hard to do.

Is the existing table named in FK constraints and such? If not, just remake it.

begin;
create table new_table as select distinct * from join_table;
drop table join_table;
alter table new_table rename TO join_table;
commit;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top