Efficiently finding rows where a < (max(a) for a given b)

https://stackoverflow.com/questions/23669347

postgresql

23-07-2023
|

Question

I have a table structure like:

a | b
2014-04-12| 3
2014-03-12| 3
2014-02-12| 3
2014-05-12| 4
2014-03-12| 4
2014-04-12| 4

I need output where a is less then the max(a) for a particular b and a is also less then now().

What i have done till now is i do a self join on b use where a < now() having table.a < max(table1.b)

And the output is correct but the cost of the query is very high as the number of rows in my table are quite large. Is there any alternative way of doing this.

My query is:

select a1.a, a1.b 
from tab a1 
JOIN tab b1 
    on a1.b=b1.b 
where a1.a < now() 
group by a1.a, a1.b 
having a1.a < max(b1.a);

Solution

I think this should be faster than the self join as only a single scan over the table is required:

select a,b
from (
  select a, 
         b, 
         max(a) over (partition by b) as max_a
  from the_table
  where a < now()
)
where a < max_a;

If the condition a < now() filters out many rows, then an index on (a,b) will help. If that still leaves many rows, an index on (b,a) might be the better choice to speed up finding the max a for a given. But only an execution plan on your real data will show that

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow