Domanda

Given the following table:

CREATE TABLE cnts(
  user_id INT,
  month_d DATE,
  cnt INT
)

I want to query cumulative counts for the last 6 months of each (user_id, month_d) pair. I can do it with the following JOIN:

SELECT
  S1.month_d AS "month_d",
  S1.user_id AS "user_id",
  SUM(S2.cnt) AS "last_6_months_cnt"
FROM cnts S1
LEFT JOIN cnts S2 ON S1.user_id = S2.user_id
                 AND (S2.month_d BETWEEN (S1.month_d - INTERVAL '5 MONTH') AND S1.month_d)
GROUP BY 1, 2
ORDER BY 2, 1;

But I wonder if this can be solved with window-functions?

Sample data:

INSERT INTO cnts(user_id, month_d, cnt) VALUES 
(1, '2013-01-01', 2),
(1, '2013-04-01', 2),
(1, '2013-07-01', 2),
(1, '2013-10-01', 2),

(2, '2013-01-01', 2),
(2, '2013-04-01', 2),
(2, '2013-07-01', 2),
(2, '2013-10-01', 2)
;

Expected results (from the join above):

  month_d   | user_id | last_6_months_cnt 
------------+---------+-------------------
 2013-01-01 |       1 |                 2
 2013-04-01 |       1 |                 4
 2013-07-01 |       1 |                 4
 2013-10-01 |       1 |                 4
 2013-01-01 |       2 |                 2
 2013-04-01 |       2 |                 4
 2013-07-01 |       2 |                 4
 2013-10-01 |       2 |                 4
È stato utile?

Soluzione

PostgreSQL 12 and newer

UPDATE: PostgreSQL 12 and newer now support RANGE windows.

The correct way would be to use a window over RANGE (INTERVAL '6' MONTH) PRECEDING:

demo=> SELECT month_d, user_id, 
              SUM(cnt) OVER (PARTITION BY user_id ORDER BY month_d RANGE INTERVAL '6' MONTH PRECEDING)
       FROM cnts ORDER BY 2,1;

  month_d   | user_id | sum 
------------+---------+-----
 2013-01-01 |       1 |   2
 2013-04-01 |       1 |   4
 2013-07-01 |       1 |   6
 2013-10-01 |       1 |   6
 2013-01-01 |       2 |   2
 2013-04-01 |       2 |   4
 2013-07-01 |       2 |   6
 2013-10-01 |       2 |   6
(8 rows)

PostgreSQL 11 and older

On PostgreSQL 11 or older RANGE windows were not yet supported so the query will fail:

regress=> SELECT month_d, user_id, 
          SUM(cnt) OVER (PARTITION BY user_id ORDER BY month_d RANGE INTERVAL '6' MONTH PRECEDING) 
          FROM cnts ORDER BY 2,1;
ERROR:  RANGE PRECEDING is only supported with UNBOUNDED
LINE 1: ...(cnt) OVER (PARTITION BY user_id ORDER BY month_d RANGE INTE...

Without that you're going to be back with a join over generate_series, and doing that over multiple user ids cumbersome. I suspect your self-join approach is significantly preferable to attempting to do this with a ROWS based window over sum. You'd have to cross join a generate_series of the entire date range with the set of all distinct uids, then left outer join that against the cnts table, process that with sum over a window, and then filter out rows with null counts. Needless to say, this is a more tortured way of doing things than a simple self-join.


For your sample data the following query will produce the same result you have shown above:

-- This query is totally wrong and only works because of overly simple sample data
SELECT 
  month_d, user_id, 
  SUM(cnt) OVER (PARTITION BY user_id ORDER BY month_d ROWS 1 PRECEDING)
FROM cnts
ORDER BY 2,1;

However, it's totally wrong. I'm showing it primarily to illustrate that the sample data isn't good enough for solid testing, since the results match essentially by sheer luck. None of your samples have more than two samples within a six month range. Sample data is great, but you need to think about the corner cases, just like when you're writing unit tests. You should have uids that don't start and stop at the same dates, with different counts, etc.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top