Query to find if there are more than X occurences within any period of given length minutes
-
12-03-2021 - |
Domanda
I have spent many hours thinking about a solution for my problem but I give up.
Let's imagine a table
user_id | occurred_at
-- OK match example
1 | 2020-01-01 08:00:00 <- First match of the set
1 | 2020-01-01 08:08:00 <- Second match (8 minutes away from the previous so OK)
1 | 2020-01-01 08:10:30 <- this already exceeds 10 minutes period so the set is excluded
-- Not matched example
1 | 2020-01-01 10:00:00 <- First match
1 | 2020-01-01 10:05:00 <- Second match (5 minutes away from the previous so OK)
1 | 2020-01-01 10:09:59 <- this fits into 10 minutes period so the set is matched (09:59 away altogether from 10:00:00)
-- Another OK (4 matched)
2 | 2020-01-01 14:23:00
2 | 2020-01-01 14:24:00
2 | 2020-01-01 14:26:00
2 | 2020-01-01 14:27:00
-- Not matched
3 | 2020-01-01 11:00:00
3 | 2020-01-01 11:01:00
3 | 2020-01-01 15:26:00
3 | 2020-01-01 18:00:00
-- User mismatch so set is not matched neither
3 | 2020-01-01 20:00:00
1 | 2020-01-01 20:01:00
2 | 2020-01-01 20:02:00
How one can query a table like this to find rows with at least N (=3 in this example) occurrences for the given user that occurred in a explicit minutes interval (=10 in this example)? I think a table example above explains it better.
Soluzione
Using the window function lag()
we can mark all rows where a qualifying set ends:
SELECT *
, occurred_at - lag(occurred_at, 2) OVER (PARTITION BY user_id ORDER BY occurred_at) <= interval '10 min' AS passed
FROM timestamps
ORDER BY user_id, occurred_at;
If the timestamp occurred_at
for the same user (user_id
) two rows back is within 10 minutes, we have a set of three.
For a given user_id
:
SELECT count(*) FILTER (WHERE passed) AS qualifying_sets
FROM (
SELECT occurred_at - lag(occurred_at, 2) OVER (ORDER BY occurred_at) <= interval '10 min' AS passed
FROM timestamps
WHERE user_id = 1 -- given user
) sub;
To get all user_id
that pass the test at least once:
SELECT user_id, count(*) FILTER (WHERE passed) AS qualifying_sets
FROM (
SELECT user_id
, occurred_at - lag(occurred_at, 2) OVER (PARTITION BY user_id ORDER BY occurred_at) <= interval '10 min' AS passed
FROM timestamps
) sub
GROUP BY 1
HAVING bool_or(passed)
ORDER BY 1;
The added count of qualifying_sets
is optional.
db<>fiddle here
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange