Question

I am working on postgres 9.3.2 and I have this table:

id  startdate   enddate    no_of_days_between

1   2010-12-22  2010-12-23  1
1   2010-12-23  2010-12-24  1
1   2010-12-24  2010-12-25  1
1   2010-12-25  2010-12-26  1
1   2010-12-26  2010-12-27  1
1   2010-12-27  2010-12-28  1
1   2010-12-28  2010-12-29  1
1   2010-12-29  2011-03-06  67
1   2011-03-06  2011-03-07  1
1   2011-03-07  2011-03-08  1
1   2011-03-08  2011-03-09  1

and what I want to do is find the streak of consecutive days. For this, I am using the row_number window function in this query:

select t.*, row_number() over (partition by no_of_days_between order by enddate) as no_of_consecutive_days from t

What I want back is something like this:

id  startdate   enddate    no_of_days    no_of_consecutive_days 
                            _between
1   2010-12-22  2010-12-23  1            1
1   2010-12-23  2010-12-24  1            2
1   2010-12-24  2010-12-25  1            3
1   2010-12-25  2010-12-26  1            4
1   2010-12-26  2010-12-27  1            5
1   2010-12-27  2010-12-28  1            6
1   2010-12-28  2010-12-29  1            7 
1   2010-12-29  2011-03-06  67           1
1   2011-03-06  2011-03-07  1            1
1   2011-03-07  2011-03-08  1            2
1   2011-03-08  2011-03-09  1            3

however what the query returns is more like it has first ordered by no_of_days_between and then by enddate so I get back :

id  startdate   enddate    no_of_days    no_of_consecutive_days 
                            _between
1   2010-12-22  2010-12-23  1            1
1   2010-12-23  2010-12-24  1            2
1   2010-12-24  2010-12-25  1            3
1   2010-12-25  2010-12-26  1            4
1   2010-12-26  2010-12-27  1            5
1   2010-12-27  2010-12-28  1            6
1   2010-12-28  2010-12-29  1            7 
1   2011-03-06  2011-03-07  1            8
1   2011-03-07  2011-03-08  1            9
1   2011-03-08  2011-03-09  1            10
1   2010-12-29  2011-03-06  67           1

Has anyone run into this problem before? how can I force it to order first and then partition?

Thanks

Was it helpful?

Solution 2

It seems that this is the expected behaviour as per the SQL standard. I had to write a function to achieve what I wanted for it to do - which is first order by date, then partition. This means that the counter is reset every time a number other then the same is present.

This was very useful for a consecutive day streak.

Code to my function:

CREATE TYPE consecutive_length_type AS
   (daystreak integer,
    streakstart date,
    streakend date;

CREATE OR REPLACE FUNCTION get_maxconsecutive_day_streak() RETURNS consecutive_length_type AS
$BODY$
declare 
    max_length integer := 0;
    end_date date;
    cons_days integer :=0;
    return_rec consecutive_length_type;
    rec record;
begin

for rec in 
    select * from table t --table as above 
loop
    if rec.no_of_days_between = 1 then 
        cons_days := cons_days + 1 ;
        if cons_days > max_length then
            max_length := cons_days;
            end_date := rec.enddate ; 
                          --this way I can see when the streak ended
        end if;
    else 
        cons_days := 0;
    end if;
end loop;


return_rec.daystreak := max_length;
return_rec.streakend := end_date;
return_rec.streakstart := end_date - max_length; 
                       --I am calculating the day start so I can use it further on

return return_rec;
end;
$BODY$
  LANGUAGE plpgsql ;

What I originally have is a table of id mapped to date. In order to compute the no of days between the two consecutive days I run this query:

select id , 
lag(date_logged,1,date_logged) over 
        (partition by id order by id, date_logged  ) as startdate
date_logged as enddate , 
date_logged- lag(date_logged,1,date_logged) over 
        (partition by id order by id, date_logged  ) as no_of_days_between
from table_name;

OTHER TIPS

You still need to have "ORDER BY enddate" at the end of your query, otherwise the order of the rows is whatever postgres feels like giving you.

The ORDER BY in your OVER clause only controls how row_number() sees the data, not how the data is ultimately returned.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top