SQL: aggregate function and group by

https://stackoverflow.com/questions/51092

09-06-2019
|

Question

Consider the Oracle emp table. I'd like to get the employees with the top salary with department = 20 and job = clerk. Also assume that there is no "empno" column, and that the primary key involves a number of columns. You can do this with:

select * from scott.emp
where deptno = 20 and job = 'CLERK'
and sal =  (select max(sal) from scott.emp
            where deptno = 20 and job = 'CLERK')

This works, but I have to duplicate the test deptno = 20 and job = 'CLERK', which I would like to avoid. Is there a more elegant way to write this, maybe using a group by? BTW, if this matters, I am using Oracle.

Solution

The following is slightly over-engineered, but is a good SQL pattern for "top x" queries.

SELECT 
 * 
FROM 
 scott.emp
WHERE 
 (deptno,job,sal) IN
 (SELECT 
   deptno,
   job,
   max(sal) 
  FROM 
   scott.emp
  WHERE 
   deptno = 20 
   and job = 'CLERK'
  GROUP BY 
   deptno,
   job
  )

Also note that this will work in Oracle and Postgress (i think) but not MS SQL. For something similar in MS SQL see question SQL Query to get latest price

OTHER TIPS

If I was certain of the targeted database I'd go with Mark Nold's solution, but if you ever want some dialect agnostic SQL*, try

SELECT * 
FROM scott.emp e
WHERE e.deptno = 20 
AND e.job = 'CLERK'
AND e.sal = (
  SELECT MAX(e2.sal) 
  FROM scott.emp e2
  WHERE e.deptno = e2.deptno 
  AND e.job = e2.job
)

*I believe this should work everywhere, but I don't have the environments to test it.

In Oracle I'd do it with an analytical function, so you'd only query the emp table once :

SELECT *
  FROM (SELECT e.*, MAX (sal) OVER () AS max_sal
          FROM scott.emp e
         WHERE deptno = 20 
           AND job = 'CLERK')
 WHERE sal = max_sal

It's simpler, easier to read and more efficient.

If you want to modify it to list list this information for all departments, then you'll need to use the "PARTITION BY" clause in OVER:

SELECT *
  FROM (SELECT e.*, MAX (sal) OVER (PARTITION BY deptno) AS max_sal
          FROM scott.emp e
         WHERE job = 'CLERK')
 WHERE sal = max_sal
ORDER BY deptno

That's great! I didn't know you could do a comparison of (x, y, z) with the result of a SELECT statement. This works great with Oracle.

As a side-note for other readers, the above query is missing a "=" after "(deptno,job,sal)". Maybe the Stack Overflow formatter ate it (?).

Again, thanks Mark.

In Oracle you can also use the EXISTS statement, which in some cases is faster.

For example... SELECT name, number FROM cust WHERE cust IN ( SELECT cust_id FROM big_table ) AND entered > SYSDATE -1 would be slow.

but SELECT name, number FROM cust c WHERE EXISTS ( SELECT cust_id FROM big_table WHERE cust_id=c.cust_id ) AND entered > SYSDATE -1 would be very fast with proper indexing. You can also use this with multiple parameters.

There are many solutions. You could also keep your original query layout by simply adding table aliases and joining on the column names, you would still only have DEPTNO = 20 and JOB = 'CLERK' in the query once.

SELECT 
  * 
FROM 
  scott.emp emptbl
WHERE
  emptbl.DEPTNO = 20 
  AND emptbl.JOB = 'CLERK'
  AND emptbl.SAL =  
    (
      select 
        max(salmax.SAL) 
      from 
        scott.emp salmax
      where 
        salmax.DEPTNO = emptbl.DEPTNO
        AND salmax.JOB = emptbl.JOB
    )

It could also be noted that the key word "ALL" can be used for these types of queries which would allow you to remove the "MAX" function.

SELECT 
  * 
FROM 
  scott.emp emptbl
WHERE
  emptbl.DEPTNO = 20 
  AND emptbl.JOB = 'CLERK'
  AND emptbl.SAL >= ALL  
    (
      select 
        salmax.SAL
      from 
        scott.emp salmax
      where 
        salmax.DEPTNO = emptbl.DEPTNO
        AND salmax.JOB = emptbl.JOB
    )

I hope that helps and makes sense.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow