Obtaining one value per person from table (“latest-n-per-group before cutoff date”)

https://stackoverflow.com/questions/11906720

25-06-2021
|

Question

In a database of mine there is a table called 'Budget' defined as follows

 CREATE TABLE BUDGET (
  ID INTEGER NOT NULL,
  THERAPIST INTEGER,
  CURDATE DATE,
  ILLNESS SMALLINT,
  HOLIDAY SMALLINT);

  ALTER TABLE BUDGET ADD PRIMARY KEY (ID);

The purpose of the table is to store how many hours a month a therapist is entitled to sick leave and regular holiday.

The table has the following values

ID: 1, THERAPIST: 36, CURDATE: 01/01/2012, ILLNESS:8, HOLIDAY: 8
ID: 2, THERAPIST: 36, CURDATE: 01/07/2012, ILLNESS:8, HOLIDAY: 10
ID: 3, THERAPIST: 74, CURDATE: 01/01/2012, ILLNESS:8, HOLIDAY: 8
ID: 4, THERAPIST: 74, CURDATE: 01/03/2012, ILLNESS:8, HOLIDAY: 10

I want to write a query which returns one row per therapist, with the most recent data (the above data shows that therapist 36 was entitled to 8 hours holiday per month in the months 1-6/2012, and from 7/2012 she is entitled to 10 hours per month). In other words, if I issue the query for the date 31/01/2012, I expect to get back rows 1 and 3; if I issue the query for the date 31/05/2012, I expect to get back rows 1 and 4 and if I issue the query for the date 31/08/2012, I expect to get back rows 2 and 4.

The following query gives me one row per therapist, but it gives me the maximum values of illness and holiday, which is not necessarily what I want.

select therapist, max (illness), max (holiday)  
from budget
where curdate <= '2012-08-31'
group by therapist

Should there be a fifth row in the table

ID: 5, THERAPIST: 74, CURDATE: 01/07/2012, ILLNESS: 6, HOLIDAY: 6

querying on 31/08/12 would return illness=8 and holiday=10, even though they should both be 6.

I tried the following query but this returns only one row

select therapist, illness, holiday
from budget
where curdate =
(select max (curdate) from budget b
 where b.therapist = budget.therapist
 and b.curdate <= '2012-08-31')

There are similar questions on SO, but none of them seem applicable to my situation. Of course, life would be easier if I could use 'group by' without an aggregate function but Firebird (and most DBMS) doesn't support this.

Solution

If I understood your problem correctly, I think you want:

select
    b.*
from
    budget b
    join (
        select
            therapist,
            max(curdate) as maxdate
        from
            budget
        where 
            curdate <= '2012-08-31'
        group by
            therapist
    ) grouped on grouped.therapist = b.therapist and grouped.maxdate = b.curdate

OTHER TIPS

This is essentially a de-duplication problem, as in you would have the same problem if you were trying to clean up your data such that there was only one row per therapist for the criteria you described. This is rarely simple.

That said, here is something you could use in your current situation:

select b.id, b.therapist, t.[curdate], b.illness, b.holiday
from budget b
inner join
(
    select therapist, MAX([curdate]) as [curdate]
    from BUDGET
    where [CURDATE] <= '2012-08-31'
    group by THERAPIST
) t on b.therapist = t.therapist and b.[CURDATE] = t.[curdate]

I think it would be something like this:

select * from therapist t
join budget a
on t.therapist=a.therapist
where id in (
select first 1 ID from BUDGET b
where 
b.theraphist=t.therapist and
b.curdate <= '2012-08-31'
order by curdate desc)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow