Вопрос

I have a table that has a lot of duplicated rows and no primary key.
I want to remove just the duplicated records, but when I try to do this it would remove all peers.

How can I find the ROWID from a table in Postgres?

Это было полезно?

Решение

On PostgreSQL the physical location of the row is called CTID.

So if you want to view it use a QUERY like this:

SELECT CTID FROM table_name

To use it on a DELETE statement to remove the duplicated records use it like this:

DELETE FROM table_name WHERE CTID NOT IN (
  SELECT RECID FROM 
    (SELECT MIN(CTID) AS RECID, other_columns 
      FROM table_name GROUP BY other_columns) 
  a);

Remember that table_name is the desired table and other_columns are the columns that you want to use to filter that.

Ie:

DELETE FROM user_department WHERE CTID NOT IN (
  SELECT RECID FROM 
    (SELECT MIN(CTID) AS RECID, ud.user_id, ud.department_id
      FROM user_department ud GROUP BY ud.user_id, ud.department_id) 
  a);

Другие советы

Simplify this by one query level:

DELETE FROM table_name
WHERE  ctid NOT IN (
   SELECT min(ctid)
   FROM   table_name
   GROUP  BY $other_columns);

.. where duplicates are defined by equality in $other_columns.
There is no need to include columns from the GROUP BY clause in the SELECT list, so you don't need another subquery.

ctid in the current manual.

You should consider using row_number() if want to delete based on a unique id column(or a timestamp), since ctid alone is not always reliable when you want to only keep recent records etc.

WITH d 
     AS (SELECT ctid c, 
                row_number() 
                  OVER ( 
                    partition BY s 
                    ORDER BY id) rn 
         FROM   t) 
DELETE FROM t 
WHERE  ctid IN (SELECT c 
               FROM   d 
               WHERE  rn > 1)  ; 

Demo

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top