문제

I am using SCOTT schema in oracle and want to delete the duplicate values like this...

SELECT   EMPNO, JOB, SAL
  FROM   (  SELECT   EMPNO,
                     JOB,
                     SAL,
                     ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
              FROM   EMP_TEST
          ORDER BY   EMPNO)
 WHERE   R_ID = 1

Right now I want to perform this in SAS ... and following code performs fine....

PROC SQL;
connect to oracle (PATH="database_name" USER=SCOTT PASSWORD=tiger); 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   SCOTT.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
);
DISCONNECT FROM oracle;
QUIT;

But I got error when I have to perform this type of query in dataset level.

Suppose my EMP_TEST is not oracle table, its actually SAS dataset in a particular path and I want to create another dataset named EMP_DEDUP from EMP_TEST dataset after remove the duplicate values. This is my approach.

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM   (  SELECT   EMPNO,
                         JOB,
                         SAL,
                         ROW_NUMBER () OVER (PARTITION BY EMPNO ORDER BY SAL) R_ID
                  FROM   path.EMP_TEST
              ORDER BY   EMPNO)
     WHERE   R_ID = 1
;
QUIT;

Here I found syntax error..... might be SAS does not support ROW_NUMBER () function. Any possible solution.. can anybody suggest...

도움이 되었습니까?

해결책

Sounds like you want the lowest SAL for each EMPNO, is that correct?

proc sort data=path.emp_test;
by empno sal;
run;

data work.emp_dedupl;
set path.emp_test;
by empno sal;
if first.empno;
run;

SAS doesn't support PARTITION OVER, so it wouldn't really allow you to do this in SQL.

다른 팁

If analytical functions are not supported then you can write something like this

PROC SQL; 
CREATE TABLE WORK.EMP_DEDUPL AS SELECT * from connection to oracle(
SELECT EMPNO, JOB, SAL
      FROM  path.EMP_TEST
     WHERE rowid in (select min(rowid) 
                      from path.EMP_TEST
                     group by empno)
);
QUIT;   
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top