Remove duplicate in SAS similar like ROW_NUMBER () function in Oracle

https://stackoverflow.com/questions/21325335

02-10-2022
|

Domanda

I am using SCOTT schema in oracle and want to delete the duplicate values like this...

SELECT   EMPNO, JOB, SAL
  FROM   (  SELECT   EMPNO,
                     JOB,
                     SAL,
                     ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
              FROM   EMP_TEST
          ORDER BY   EMPNO)
 WHERE   R_ID = 1

Right now I want to perform this in SAS ... and following code performs fine....

PROC SQL;
connect to oracle (PATH="database_name" USER=SCOTT PASSWORD=tiger); 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   SCOTT.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
);
DISCONNECT FROM oracle;
QUIT;

But I got error when I have to perform this type of query in dataset level.

Suppose my EMP_TEST is not oracle table, its actually SAS dataset in a particular path and I want to create another dataset named EMP_DEDUP from EMP_TEST dataset after remove the duplicate values. This is my approach.

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   path.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
;
QUIT;

Here I found syntax error..... might be SAS does not support ROW_NUMBER () function. Any possible solution.. can anybody suggest...

Soluzione

Sounds like you want the lowest SAL for each EMPNO, is that correct?

proc sort data=path.emp_test;
by empno sal;
run;

data work.emp_dedupl;
set path.emp_test;
by empno sal;
if first.empno;
run;

SAS doesn't support PARTITION OVER, so it wouldn't really allow you to do this in SQL.

Altri suggerimenti

If analytical functions are not supported then you can write something like this

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM  path.EMP_TEST
     WHERE rowid in (select min(rowid) 
                      from path.EMP_TEST
                     group by empno)
);
QUIT;

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow