Pergunta

When I started tackling this problem, I thought, "This will be a great query to learn about Window Functions." I wasn't able to end up getting it to work with window functions, but I was able to get what I wanted using a join.

How would you adapt this query to use window functions:

SELECT
    day,
    COUNT(i.project) as num_open
FROM generate_series(0, 364) as t(day)
    LEFT JOIN issues i on (day BETWEEN i.closed_days_ago AND i.created_days_ago)
GROUP BY day
ORDER BY day;

The query above takes a list of issues that have a range represented by created_days_ago and closed_days ago and for the last 365 days, it'll count the number of issues that were created but not yet closed for that specific day.

http://sqlfiddle.com/#!15/663f6/2

The issues table looks like:

CREATE TABLE issues (
  id SERIAL,
  project VARCHAR(255),
  created_days_ago INTEGER,
  closed_days_ago INTEGER);

What I was thinking was that the partition for a given day should include all the rows in issues where day is between the created and closed days ago. Something like SELECT day, COUNT(i.project) OVER (PARTITION day BETWEEN created_days_ago AND closed_days_ago) ...

I've never used window functions before, so I might be missing something basic, but it seemed like this was just the type of query that makes window functions so awesome.

Foi útil?

Solução

The fact that you use generate_series() to create a full range of days, including those days with no changes, and thus no rows in table issues, does not rule out the use of window functions.

In fact, this query runs 50 times faster than the query in the Q in my local test:

SELECT t.day
      ,  COALESCE(sum(a.created) OVER (ORDER BY t.day DESC), 0)
       - COALESCE(sum(b.closed)  OVER (ORDER BY t.day DESC), 0) AS open_tickets
FROM   generate_series(0, 364) t(day)
LEFT   JOIN (SELECT created_days_ago AS day, count(*) AS created
             FROM   issues GROUP BY 1) a USING (day)
LEFT   JOIN (SELECT closed_days_ago AS day, count(*) AS closed
             FROM   issues GROUP BY 1) b USING (day)
ORDER  BY 1;

It is also correct, as opposed to the query in the question, which results in 17 open tickets on day 0, although all of them have been closed.
The error is due to BETWEEN in your join condition, which includes upper and lower border. This way tickets are still counted as "open" on the day they are closed.

Each row in the result reflects the number of open tickets at the end of the day.

Explain

The query combines window functions with aggregate functions.

  • Subquery a counts the number of created tickets per day. This results in a single row per day, making the rest easier.
    Subquery b does the same for closed tickets.

  • Use LEFT JOINs to join to the generated list of days in subquery t.
    Be wary of joining to multiple unaggregated tables! That could trigger a CROSS JOIN among the joined tables for multiple matches per row, generating incorrect results. Compare:
    Two SQL LEFT JOINS produce incorrect result

  • Finally use two window functions to compute the running total of created versus closed tickets.
    An alternative would be to use this in the outer SELECT

    sum(COALESCE(a.created, 0)
      - COALESCE(b.closed,  0)) OVER (ORDER BY t.day DESC) AS open_tickets
    

    Performs the same in my tests.

-> SQLfiddle demo.

Aside: I would never store "days_ago" in a table, but the absolute date / timestamp. Looks like a simplification for the purpose of this question.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top