Вопрос

I have the following code to detect duplicates in a single table:

      UPDATE tab
        SET dup = 'Y'
      WHERE ROWID IN
        (SELECT tab_o.ROWID
          FROM tab tab_o,
            (SELECT *
             FROM tab tab_i
             WHERE ROWID IN
              (SELECT ROWID
               FROM
                (SELECT ROWID,
                  ROW_NUMBER() OVER(PARTITION BY a, b, c ORDER BY a, b, c) dupl
                 FROM tab
                 WHERE a IS NOT NULL
                   AND a    = 1
                   AND b    = 1
                   AND c    = 3
                )
               WHERE dupl > 1
              )
          ) res
         WHERE tab_o.a  = res.a
          AND tab_o.b   = res.b
          AND tab_o.c   = res.c
        );

I googled so many websites, and found that most of the people follow this efficient way. But no where i found the proper explanation on how these nested queries works.

Это было полезно?

Решение

Make it simplier. No analytics needed.

Sample table:

12:57:37 SYSTEM@dwal> create table dupe_test
                   2  (a number, b number, c number, is_dupe char);

Table created.                                                                           

12:57:50 SYSTEM@dwal> insert all                    
12:57:50   2    into dupe_test values (1, 1, 1, 'n')
12:57:50   3    into dupe_test values (1, 1, 1, 'n')
12:57:50   4    into dupe_test values (1, 1, 1, 'n')
12:57:50   5    into dupe_test values (1, 2, 1, 'n')
12:57:50   6    into dupe_test values (1, 2, 1, 'n')
12:57:50   7    into dupe_test values (1, 2, 1, 'n')
12:57:50   8  select * from dual;                   

6 rows created.                                     

There it is:

12:58:17 SYSTEM@dwal> select * from dupe_test;

         A          B          C I            
---------- ---------- ---------- -            
         1          1          1 n            
         1          1          1 n            
         1          1          1 n            
         1          2          1 n            
         1          2          1 n            
         1          2          1 n            

6 rows selected.                              

Unique values:

12:59:35 SYSTEM@dwal> select rowid,  t.* 
                   2  from dupe_test t 
                   3  where rowid in (select min(rowid) 
                   4                  from dupe_test 
                   5                  group by a, b, c);

ROWID                       A          B          C I                                                                        
------------------ ---------- ---------- ---------- -                                                                        
AAARN1AABAAAO9JAAD          1          2          1 n                                                                        
AAARN1AABAAAO9JAAA          1          1          1 n                                                                        

Update and result:

12:59:51 SYSTEM@dwal> update dupe_test t 
                   2  set is_dupe = 'y' 
                   3  where rowid not in (select min(rowid) 
                   4                      from dupe_test 
                   5                      group by a, b, c); 

4 rows updated.                                                                                                                    

13:00:45 SYSTEM@dwal> select * from dupe_test;                                                                                     

         A          B          C I                                                                                                 
---------- ---------- ---------- -                                                                                                 
         1          1          1 n                                                                                                 
         1          1          1 y                                                                                                 
         1          1          1 y                                                                                                 
         1          2          1 n                                                                                                 
         1          2          1 y                                                                                                 
         1          2          1 y                                                                                                 

6 rows selected.                                                                                                                   

UPDATE:

What I am trying to do is I found that one entry in table is repeated in the same table, all such entries will be marked with dupl flag, including the original entry

Still no analytics needed. Just add having count(*) = 1 in your subquery, so you will update only non-unique rows. Having clause is basically a where-condition for aggregate functions without the need to wrap your query in a subquery. It is executed last.

11:03:00 SYSTEM@dwal> insert into dupe_test values (1,3,1,'n') -- add some unique row
11:03:09   2  /                                                                      

1 row created.                                                                       

11:03:10 SYSTEM@dwal> update dupe_test set is_dupe = 'y'                             
11:03:27   2  where rowid not in                                                     
11:03:34   3  (select min(rowid) from dupe_test                                      
11:03:51   4  group by a,b,c                                                         
11:04:00   5  having count(*) = 1);                                                  

6 rows updated.                                                                      

11:04:06 SYSTEM@dwal> select * from dupe_test;                                       

         A          B          C I                                                   
---------- ---------- ---------- -                                                   
         1          1          1 y                                                   
         1          1          1 y                                                   
         1          1          1 y                                                   
         1          2          1 y                                                   
         1          2          1 y                                                   
         1          2          1 y                                                   
         1          3          1 n                                                   

7 rows selected.                                                                     
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top