Mysql get records more then 3 in interval of 1 minute
Pregunta
I currently developed a cron that runs every 1 minute to analyze the last 60 seconds of the bot records in database, I need to group the conversation ID's that have more than 3 records within 60 seconds in the same url and client_session_id.
follows the SQL I'm running:
select
count(session_id),
client_session_id,
GROUP_CONCAT(id) as talkIds,
origin_url
from
bot_talk
where
created_date > now() - interval 60 second
group by
client_session_id, origin_url
having
count(session_id) >= 3
This query works as I expect, but sometimes my cron service is sometimes down, and I lose those repeated records.
I thought about making an SQL(Cron) at the end of the day to analyze the last 24 hours, and look for the records that are repeated according to the rule I mentioned above?
Currently my database looks like this:
created_date | origin_url | client_session_id |
---|---|---|
2021-01-18 11:02:24.0 | https://chat-app.ttttttt.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:02:35.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:11:03.0 | https://chat-app.testett.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:44:28.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:49:36.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:51:05.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:51:15.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:51:19.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:51:43.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 11:51:50.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 12:01:24.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 12:04:48.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 13:40:50.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 15:54:38.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 15:54:45.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 15:55:08.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 15:58:07.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 15:58:11.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 15:59:56.0 | https://someurltestenter1502211068.zendes.com/ | znkjoc3gfth2c3m0t1klii |
2021-01-18 16:08:32.0 | https://admin.testete.com/ | znkjoc3gfth2c3m0t1klii |
fiddle: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=b8ce65bad5d39ea85fc5b57d7dc0f729
Solución
To answer your question, I did the following (works for MySQL 8 - window functions used - see fiddle here):
Created the table:
CREATE TABLE test
(
created_date TIMESTAMP(1) NOT NULL,
origin_url VARCHAR (200) NOT NULL,
client_session_id VARCHAR (50) NOT NULL
);
Populate it (sample):
INSERT INTO test VALUES
('2021-01-18 11:02:24.0', 'https://ttttttt.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:02:35.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:11:03.0', 'https://testett.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:49:28.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:50:36.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:51:05.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii');
Then I used the LEAD()
window function - the syntax is as follows:
LEAD(<expression>[,offset[, default_value]]) OVER (
PARTITION BY (expr)
ORDER BY (expr)
)
The initial SQL is:
SELECT
created_date,
LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id) AS l_3,
TIMESTAMPDIFF
(
MINUTE,
created_date,
LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id)
) AS min_diff,
origin_url,
client_session_id
FROM test;
Result:
created_date l_3 min_diff ut_d origin_url client_session_id
2021-01-18 11:11:03.0 https://testett.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:02:24.0 https://ttttttt.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:05.0 2021-01-18 11:49:28.0 1 97.0 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:50:36.0 2021-01-18 11:02:35.0 48 2881.0 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:49:28.0 https://zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:02:35.0 https://zendes.com/ znkjoc3gfth2c3m0t1klii
So, it was noticed that UNIXTIMESTAMP
(gives the difference in seconds) might be better than TIMESTAMPDIFF
- (see here) - so, in the end, I used (see the fiddle for the result):
SELECT
created_date,
LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id) AS l_3,
ABS(TIMESTAMPDIFF
(
MINUTE,
created_date,
LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id)
)) AS min_diff,
UNIX_TIMESTAMP(created_date) -
UNIX_TIMESTAMP
(
LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id)
) AS ut_d,
origin_url,
client_session_id
FROM test;
The final SQL and result are:
SELECT * FROM
(
SELECT
created_date,
LEAD(created_date, 2)
OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id) AS l_3,
created_date -
LEAD(created_date, 2)
OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id) AS l_diff,
UNIX_TIMESTAMP(created_date) -
UNIX_TIMESTAMP
(
LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id
ORDER BY created_date DESC, origin_url, client_session_id)
) AS ut_d,
origin_url,
client_session_id
FROM test
) AS t
WHERE t.ut_d < 180;
Result:
created_date l_3 l_diff ut_d origin_url client_session_id
2021-01-18 11:51:05.0 2021-01-18 11:49:28.0 177.0 97.0 https://zendes.com/
znkjoc3gfth2c3m0t1klii
You mention that you have MySQL 5.7 - which doesn't support window functions. You can either upgrade or use various simulation methods - my advice is to upgrade.