Question

Here's an example dataset.

with activity_cte (day, user_id, act1, act2) as (
    values
        ('2020-01-01'::date, 1, 0, 1),
        ('2020-01-01'::date, 3, 1, 0),
        ('2020-01-02'::date, 1, 3, 2),
        ('2020-01-02'::date, 2, 0, 2),
        ('2020-01-02'::date, 5, 0, 1),
        ('2020-01-03'::date, 1, 1, 2),
        ('2020-01-03'::date, 5, 1, 1),
        ('2020-01-04'::date, 2, 1, 1),
        ('2020-01-04'::date, 5, 4, 0)
)
select * from activity_cte;

In this I'm tracking user activity counts. There are two activities I'm tracking here: "act1" and "act2", and I'm just summing the number of times the user engages in that activity over time. For now, if a user doesn't engage in either activity on a given day they're just not showing up for that day (though this could be changed if needed). So for example, user 2 engages in activity 2 twice on Jan 2 and in both activities once each on Jan 4.

What I'd like to do is calculate the number of "active" users each day which I'll define as users who have engaged in at least one of the activities since the day before (really it'll be something like a week before, but I don't want to have to write a ton of rows in this example set here). So here's what I'd like returned.

2020-01-01 2
2020-01-02 4
2020-01-03 3
2020-01-04 3

This seems like something I'd need to use a window function for. Maybe I'm overthinking it, but I'm having trouble coming up with the actual approach to generate these numbers.

Était-ce utile?

La solution

That may not be the most elegant solution, but it works:

with activity_cte (day, user_id, act1, act2) as (
    values
        ('2020-01-01'::date, 1, 0, 1),
        ('2020-01-01'::date, 3, 1, 0),
        ('2020-01-02'::date, 1, 3, 2),
        ('2020-01-02'::date, 2, 0, 2),
        ('2020-01-02'::date, 5, 0, 1),
        ('2020-01-03'::date, 1, 1, 2),
        ('2020-01-03'::date, 5, 1, 1),
        ('2020-01-04'::date, 2, 1, 1),
        ('2020-01-04'::date, 5, 4, 0)
),
day_users AS (
    SELECT DISTINCT
           day,
           array_agg(user_id)
               OVER (ORDER BY day
                     RANGE BETWEEN INTERVAL '1 day' PRECEDING AND CURRENT ROW) AS users
    FROM activity_cte
)
SELECT day_users.day,
       count(DISTINCT user_id)
FROM day_users
    CROSS JOIN LATERAL unnest(users) AS u(user_id)
GROUP BY day_users.day;

    day     | count 
------------+-------
 2020-01-01 |     2
 2020-01-02 |     4
 2020-01-03 |     3
 2020-01-04 |     3
(4 rows)
Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange
scroll top