Question

Having trouble putting together a query to pull the aggregate values of a give timestamp and the timestamp before it. Given the following schema:

name TEXT, 
ts TIMESTAMP, 
X NUMERIC, 
Y NUMERIC

where there are gaps in the ts column due to gaps in data, I'm trying to construct a query to produce

name, 
date_trunc('day' q1.ts), 
avg(q1.X), 
sum(q2.Y), 
date_trunc('day', q2.ts), 
avg(q2.X), 
sum(q2.Y)

The first half is straightforward:

SELECT q1.name, date_trunc('day', q1.ts), avg(q1.X), sum(q1.Y)
FROM data as q1
GROUP BY 1, 2
ORDER BY 1, 2;

But not sure how to generate the relation to find the "day" before for each row. I'm trying to work an inner join like this:

SELECT q1.name, q1.day, q1.avg, q1.sum, q2.day, q2.avg, q2.sum
FROM (
    SELECT name, date_trunc('day', ts) AS day, avg(X) AS avg, sum(Y) as sum
    FROM data
    GROUP BY 1,2
    ORDER BY 1,2
) q1 INNER JOIN (
    SELECT name, date_trunc('day', ts) AS day, avg(X) AS avg, sum(Y) as sum
    FROM data
    GROUP BY 1,2
    ORDER BY 1,2
) q2 ON (
    q1.name = q2.name 
    AND  q2.day = q1.day - interval '1 day'
);

The problem with this is, it doesn't cover the cases when the next "day" is more than 1 day before the current day.

Was it helpful?

Solution

The special difficulty here is that you need to number days after aggregating rows. You can do this in a single query level with the window function row_number(), since window functions are applied after aggregation by GROUP BY.

Also, use a CTE to avoid executing the same subquery multiple times:

WITH q AS (
    SELECT name, ts::date AS day
          ,avg(x) AS avg_x, sum(y) AS sum_y
          ,row_number() OVER (PARTITION BY name ORDER BY ts::date) AS rn
    FROM   data
    GROUP  BY 1,2
   )
SELECT q1.name, q1.day, q1.avg_x, q1.sum_y
      ,q2.day AS day2, q2.avg_x AS avg_x2, q2.sum_y AS sum_y2
FROM   q      q1
LEFT   JOIN q q2 ON q1.name = q2.name 
                AND q1.rn   = q2.rn + 1
ORDER  BY 1,2;

Using the simpler cast to date (ts::date) instead of date_trunc('day', ts) to get "days".
LEFT [OUTER] JOIN (as opposed to [INNER] JOIN) is instrumental to preserve the corner case of the first row, where there is no previous day.
And ORDER BY should be applied to the outer query.

OTHER TIPS

The question isn't crystal clear, but it sounds like you're actually trying to fill gaps while keeping track of leading/lagging rows.

To fill the gaps, look into generate_series() and left join it with your table:

select d
from generate_series(timestamp '2013-12-01', timestamp '2013-12-31', interval '1 day') d;

http://www.postgresql.org/docs/current/static/functions-srf.html

For previous and next row values, look into lead() and lag() window functions:

select date_trunc('day', ts) as curr_row_day,
       lag(date_trunc('day', ts)) over w as prev_row_day
from data
window w as (order by ts)

http://www.postgresql.org/docs/current/static/tutorial-window.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top