Find members with consecutive entries
-
30-12-2020 - |
Question
I need to find all members in the database with data on 7 or more consecutive days. The table is set up as member_id and data_date. There are duplicate member_ids for each day that there is data.
I found some answers using MySQL that involved datediff
or dateadd
, but I am not sure how to do this in PostgreSQL 8.2.15. Below is an example of what the table looks like, but with many more rows.
+------------+------------+
| member_id | data_date |
+------------+------------+
| 0000001 | 2018-04-10 |
| 0000005 | 2018-04-16 |
| 0000001 | 2018-04-11 |
| 0000002 | 2018-04-12 |
| 0000003 | 2018-04-13 |
| 0000004 | 2018-04-12 |
| 0000005 | 2018-04-15 |
| 0000003 | 2018-04-19 |
| 0000002 | 2018-04-17 |
| 0000001 | 2018-04-18 |
| 0000005 | 2018-04-10 |
| 0000002 | 2018-04-18 |
| 0000001 | 2018-04-08 |
| 0000002 | 2018-04-03 |
| 0000003 | 2018-04-02 |
| 0000004 | 2018-04-14 |
| 0000005 | 2018-04-15 |
| 0000003 | 2018-04-16 |
| 0000002 | 2018-04-19 |
| 0000001 | 2018-04-14 |
+------------+------------+
(member_id, data_date)
is defined UNIQUE
.
Solution
This simple query should work in your outdated version:
SELECT DISTINCT member_id
FROM tbl t1
JOIN tbl t2 USING (member_id)
JOIN tbl t3 USING (member_id)
JOIN tbl t4 USING (member_id)
JOIN tbl t5 USING (member_id)
JOIN tbl t6 USING (member_id)
JOIN tbl t7 USING (member_id)
WHERE t2.data_date = t1.data_date + 1
AND t3.data_date = t1.data_date + 2
AND t4.data_date = t1.data_date + 3
AND t5.data_date = t1.data_date + 4
AND t6.data_date = t1.data_date + 5
AND t7.data_date = t1.data_date + 6;
In Postgres, you can just add integer
to a date
to get the next day.
And probably fast, too - as long as you don't have much longer streaks of entries producing many dupes in the first step.
Or:
SELECT DISTINCT member_id
FROM tbl t1
JOIN tbl t2 USING (member_id)
WHERE t2.data_date BETWEEN t1.data_date + 1
AND t1.data_date + 6
GROUP BY t1.member_id, t1.date
HAVING count(*) = 6;
Update to a current version of Postgres at the earliest opportunity.
OTHER TIPS
Your version is very antiquated and insecure. For reference 8.2 was last supported in 2011. It was released in 2006. It lacks
- tstz ranges
- window functions
- lateral joins
Moreover, we can't easily configure your setup to test our queries. Upgrade your version PostgreSQL; if you find that difficult to do, consider seeking out a consultant.
I think what you want is something like
SELECT member_id, t2.date, count(*)
FROM t AS t1
CROSS JOIN t AS t2
ON ( member_id )
WHERE t1.date_date BETWEEN t2.date AND t2.date_date + '7 days
GROUP BY member_id, t2.date
HAVING count(*) >= 7;
But I can't test it and you didn't provide CREATE TABLE
or INSERT
statements.