The fact that you use generate_series()
to create a full range of days, including those days with no changes, and thus no rows in table issues
, does not rule out the use of window functions.
In fact, this query runs 50 times faster than the query in the Q in my local test:
SELECT t.day
, COALESCE(sum(a.created) OVER (ORDER BY t.day DESC), 0)
- COALESCE(sum(b.closed) OVER (ORDER BY t.day DESC), 0) AS open_tickets
FROM generate_series(0, 364) t(day)
LEFT JOIN (SELECT created_days_ago AS day, count(*) AS created
FROM issues GROUP BY 1) a USING (day)
LEFT JOIN (SELECT closed_days_ago AS day, count(*) AS closed
FROM issues GROUP BY 1) b USING (day)
ORDER BY 1;
It is also correct, as opposed to the query in the question, which results in 17 open tickets on day 0, although all of them have been closed.
The error is due to BETWEEN
in your join condition, which includes upper and lower border. This way tickets are still counted as "open" on the day they are closed.
Each row in the result reflects the number of open tickets at the end of the day.
Explain
The query combines window functions with aggregate functions.
Subquery
a
counts the number of created tickets per day. This results in a single row per day, making the rest easier.
Subqueryb
does the same for closed tickets.Use LEFT JOINs to join to the generated list of days in subquery
t
.
Be wary of joining to multiple unaggregated tables! That could trigger aCROSS JOIN
among the joined tables for multiple matches per row, generating incorrect results. Compare:
Two SQL LEFT JOINS produce incorrect resultFinally use two window functions to compute the running total of created versus closed tickets.
An alternative would be to use this in the outerSELECT
sum(COALESCE(a.created, 0) - COALESCE(b.closed, 0)) OVER (ORDER BY t.day DESC) AS open_tickets
Performs the same in my tests.
Aside: I would never store "days_ago" in a table, but the absolute date / timestamp. Looks like a simplification for the purpose of this question.