Domanda

I have data table in Oracle 8,1. There are about a million rows. But lots of rows duplicates by the same columns. I need to know fastest way to clear this data. For example I have:

id name surname date
21 'john' 'smith' '2012 12 12'; 
21 'john' 'smith' '2012 12 13';
21 'john' 'smith' '2012 12 14';
....

And now I need to delete first two rows as they duplicates by first three columns and keep the row with the latest date.

È stato utile?

Soluzione

If there are really lots of duplicates, I'd recommend to recreate the table with only the clean data:

CREATE TABLE tmp AS 
SELECT id, name, surname, max(d) as d
   FROM t
  GROUP BY id, name, surname;

and then replace the original table with the original table:

RENAME your_table TO old_table;
RENAME tmp_table TO your_table;

Don't forget to move indexes, constraints and privileges...

Altri suggerimenti

delete from table t where
exists (select * from table where id=t.id and name=t.name and surname=t.surname
        and date > t.date)

How fast this is depends con your Oracle parameters. And index on (id,name,surname) might help.

If possible, I'd go for a CTAS (create table as select), truncate the original table, and copy the data back:

-- create the temp table (it contains only the latest values for a given (id, name, surname) triple
CREATE TABLE tmp as 
SELECT id, name, surname, date1 from 
(select 
  t1.*, 
  row_number() over (partition by id, name, surname order by date1 desc) rn
from mytab t1)
where rn = 1;

-- clear the original table
TRUNCATE TABLE mytab;

-- copy the data back
INSERT /* +APPEND */ INTO mytab(id,name,surname,date1) 
  (SELECT id,name,surname,date1 from tmp);
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top