Question

I have data table in Oracle 8,1. There are about a million rows. But lots of rows duplicates by the same columns. I need to know fastest way to clear this data. For example I have:

id name surname date
21 'john' 'smith' '2012 12 12'; 
21 'john' 'smith' '2012 12 13';
21 'john' 'smith' '2012 12 14';
....

And now I need to delete first two rows as they duplicates by first three columns and keep the row with the latest date.

Was it helpful?

Solution

If there are really lots of duplicates, I'd recommend to recreate the table with only the clean data:

CREATE TABLE tmp AS 
SELECT id, name, surname, max(d) as d
   FROM t
  GROUP BY id, name, surname;

and then replace the original table with the original table:

RENAME your_table TO old_table;
RENAME tmp_table TO your_table;

Don't forget to move indexes, constraints and privileges...

OTHER TIPS

delete from table t where
exists (select * from table where id=t.id and name=t.name and surname=t.surname
        and date > t.date)

How fast this is depends con your Oracle parameters. And index on (id,name,surname) might help.

If possible, I'd go for a CTAS (create table as select), truncate the original table, and copy the data back:

-- create the temp table (it contains only the latest values for a given (id, name, surname) triple
CREATE TABLE tmp as 
SELECT id, name, surname, date1 from 
(select 
  t1.*, 
  row_number() over (partition by id, name, surname order by date1 desc) rn
from mytab t1)
where rn = 1;

-- clear the original table
TRUNCATE TABLE mytab;

-- copy the data back
INSERT /* +APPEND */ INTO mytab(id,name,surname,date1) 
  (SELECT id,name,surname,date1 from tmp);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top