Question

I have spent many hours thinking about a solution for my problem but I give up.

Let's imagine a table

user_id | occurred_at
   -- OK match example
   1    | 2020-01-01 08:00:00 <- First match of the set
   1    | 2020-01-01 08:08:00 <- Second match (8 minutes away from the previous so OK)
   1    | 2020-01-01 08:10:30 <- this already exceeds 10 minutes period so the set is excluded

   -- Not matched example
   1    | 2020-01-01 10:00:00 <- First match
   1    | 2020-01-01 10:05:00 <- Second match (5 minutes away from the previous so OK)
   1    | 2020-01-01 10:09:59 <- this fits into 10 minutes period so the set is matched (09:59 away altogether from 10:00:00)

   -- Another OK (4 matched)
   2    | 2020-01-01 14:23:00
   2    | 2020-01-01 14:24:00
   2    | 2020-01-01 14:26:00
   2    | 2020-01-01 14:27:00

   -- Not matched
   3    | 2020-01-01 11:00:00
   3    | 2020-01-01 11:01:00
   3    | 2020-01-01 15:26:00
   3    | 2020-01-01 18:00:00

   -- User mismatch so set is not matched neither
   3    | 2020-01-01 20:00:00
   1    | 2020-01-01 20:01:00
   2    | 2020-01-01 20:02:00

How one can query a table like this to find rows with at least N (=3 in this example) occurrences for the given user that occurred in a explicit minutes interval (=10 in this example)? I think a table example above explains it better.

http://sqlfiddle.com/#!17/54d43/1

Was it helpful?

Solution

Using the window function lag() we can mark all rows where a qualifying set ends:

SELECT *
     , occurred_at - lag(occurred_at, 2) OVER (PARTITION BY user_id ORDER BY occurred_at) <= interval '10 min' AS passed
FROM   timestamps
ORDER  BY user_id, occurred_at;

If the timestamp occurred_at for the same user (user_id) two rows back is within 10 minutes, we have a set of three.

For a given user_id:

SELECT count(*) FILTER (WHERE passed) AS qualifying_sets
FROM (
   SELECT occurred_at - lag(occurred_at, 2) OVER (ORDER BY occurred_at) <= interval '10 min' AS passed
   FROM   timestamps
   WHERE  user_id = 1  -- given user
   ) sub;

To get all user_id that pass the test at least once:

SELECT user_id, count(*) FILTER (WHERE passed) AS qualifying_sets
FROM  (
   SELECT user_id
        , occurred_at - lag(occurred_at, 2) OVER (PARTITION BY user_id ORDER BY occurred_at) <= interval '10 min' AS passed
   FROM   timestamps
   ) sub
GROUP  BY 1
HAVING bool_or(passed)
ORDER  BY 1;

The added count of qualifying_sets is optional.

db<>fiddle here

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top