SQL: aggregate function and group by
Question
Consider the Oracle emp
table. I'd like to get the employees with the top salary with department = 20
and job = clerk
. Also assume that there is no "empno" column, and that the primary key involves a number of columns. You can do this with:
select * from scott.emp
where deptno = 20 and job = 'CLERK'
and sal = (select max(sal) from scott.emp
where deptno = 20 and job = 'CLERK')
This works, but I have to duplicate the test deptno = 20 and job = 'CLERK', which I would like to avoid. Is there a more elegant way to write this, maybe using a group by
? BTW, if this matters, I am using Oracle.
Solution
The following is slightly over-engineered, but is a good SQL pattern for "top x" queries.
SELECT
*
FROM
scott.emp
WHERE
(deptno,job,sal) IN
(SELECT
deptno,
job,
max(sal)
FROM
scott.emp
WHERE
deptno = 20
and job = 'CLERK'
GROUP BY
deptno,
job
)
Also note that this will work in Oracle and Postgress (i think) but not MS SQL. For something similar in MS SQL see question SQL Query to get latest price
OTHER TIPS
If I was certain of the targeted database I'd go with Mark Nold's solution, but if you ever want some dialect agnostic SQL*, try
SELECT *
FROM scott.emp e
WHERE e.deptno = 20
AND e.job = 'CLERK'
AND e.sal = (
SELECT MAX(e2.sal)
FROM scott.emp e2
WHERE e.deptno = e2.deptno
AND e.job = e2.job
)
*I believe this should work everywhere, but I don't have the environments to test it.
In Oracle I'd do it with an analytical function, so you'd only query the emp table once :
SELECT *
FROM (SELECT e.*, MAX (sal) OVER () AS max_sal
FROM scott.emp e
WHERE deptno = 20
AND job = 'CLERK')
WHERE sal = max_sal
It's simpler, easier to read and more efficient.
If you want to modify it to list list this information for all departments, then you'll need to use the "PARTITION BY" clause in OVER:
SELECT *
FROM (SELECT e.*, MAX (sal) OVER (PARTITION BY deptno) AS max_sal
FROM scott.emp e
WHERE job = 'CLERK')
WHERE sal = max_sal
ORDER BY deptno
That's great! I didn't know you could do a comparison of (x, y, z) with the result of a SELECT statement. This works great with Oracle.
As a side-note for other readers, the above query is missing a "=" after "(deptno,job,sal)". Maybe the Stack Overflow formatter ate it (?).
Again, thanks Mark.
In Oracle you can also use the EXISTS statement, which in some cases is faster.
For example... SELECT name, number FROM cust WHERE cust IN ( SELECT cust_id FROM big_table ) AND entered > SYSDATE -1 would be slow.
but SELECT name, number FROM cust c WHERE EXISTS ( SELECT cust_id FROM big_table WHERE cust_id=c.cust_id ) AND entered > SYSDATE -1 would be very fast with proper indexing. You can also use this with multiple parameters.
There are many solutions. You could also keep your original query layout by simply adding table aliases and joining on the column names, you would still only have DEPTNO = 20 and JOB = 'CLERK' in the query once.
SELECT
*
FROM
scott.emp emptbl
WHERE
emptbl.DEPTNO = 20
AND emptbl.JOB = 'CLERK'
AND emptbl.SAL =
(
select
max(salmax.SAL)
from
scott.emp salmax
where
salmax.DEPTNO = emptbl.DEPTNO
AND salmax.JOB = emptbl.JOB
)
It could also be noted that the key word "ALL" can be used for these types of queries which would allow you to remove the "MAX" function.
SELECT
*
FROM
scott.emp emptbl
WHERE
emptbl.DEPTNO = 20
AND emptbl.JOB = 'CLERK'
AND emptbl.SAL >= ALL
(
select
salmax.SAL
from
scott.emp salmax
where
salmax.DEPTNO = emptbl.DEPTNO
AND salmax.JOB = emptbl.JOB
)
I hope that helps and makes sense.