Frage

I currently developed a cron that runs every 1 minute to analyze the last 60 seconds of the bot records in database, I need to group the conversation ID's that have more than 3 records within 60 seconds in the same url and client_session_id.

follows the SQL I'm running:

select
    count(session_id),
    client_session_id,
    GROUP_CONCAT(id) as talkIds,
    origin_url 
from
    bot_talk
where
    created_date > now() - interval 60 second
group by
    client_session_id, origin_url 
having
    count(session_id) >= 3

This query works as I expect, but sometimes my cron service is sometimes down, and I lose those repeated records.

I thought about making an SQL(Cron) at the end of the day to analyze the last 24 hours, and look for the records that are repeated according to the rule I mentioned above?

Currently my database looks like this:

created_date origin_url client_session_id
2021-01-18 11:02:24.0 https://chat-app.ttttttt.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:02:35.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:11:03.0 https://chat-app.testett.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:44:28.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:49:36.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:05.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:15.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:19.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:43.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:50.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 12:01:24.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 12:04:48.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 13:40:50.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 15:54:38.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 15:54:45.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 15:55:08.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 15:58:07.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 15:58:11.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 15:59:56.0 https://someurltestenter1502211068.zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 16:08:32.0 https://admin.testete.com/ znkjoc3gfth2c3m0t1klii

fiddle: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=b8ce65bad5d39ea85fc5b57d7dc0f729

War es hilfreich?

Lösung

To answer your question, I did the following (works for MySQL 8 - window functions used - see fiddle here):

Created the table:

CREATE TABLE test 
(
  created_date TIMESTAMP(1) NOT NULL,
  origin_url   VARCHAR (200) NOT NULL,
  client_session_id VARCHAR (50) NOT NULL
);

Populate it (sample):

INSERT INTO test VALUES
('2021-01-18 11:02:24.0', 'https://ttttttt.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:02:35.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:11:03.0', 'https://testett.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:49:28.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:50:36.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii'),
('2021-01-18 11:51:05.0', 'https://zendes.com/', 'znkjoc3gfth2c3m0t1klii');

Then I used the LEAD() window function - the syntax is as follows:

LEAD(<expression>[,offset[, default_value]]) OVER (
    PARTITION BY (expr)
    ORDER BY (expr)
)

The initial SQL is:

SELECT
  created_date,
  LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id 
                            ORDER BY created_date DESC, origin_url, client_session_id) AS l_3,
  TIMESTAMPDIFF
  (
    MINUTE,
    created_date, 
    LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id 
                            ORDER BY created_date DESC, origin_url, client_session_id)
  ) AS min_diff,
  origin_url,
  client_session_id
FROM test;

Result:

created_date    l_3 min_diff    ut_d    origin_url  client_session_id
2021-01-18 11:11:03.0                                       https://testett.com/    znkjoc3gfth2c3m0t1klii
2021-01-18 11:02:24.0                                       https://ttttttt.com/    znkjoc3gfth2c3m0t1klii
2021-01-18 11:51:05.0   2021-01-18 11:49:28.0   1   97.0    https://zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:50:36.0   2021-01-18 11:02:35.0   48  2881.0  https://zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:49:28.0                                       https://zendes.com/ znkjoc3gfth2c3m0t1klii
2021-01-18 11:02:35.0                                       https://zendes.com/ znkjoc3gfth2c3m0t1klii

So, it was noticed that UNIXTIMESTAMP (gives the difference in seconds) might be better than TIMESTAMPDIFF - (see here) - so, in the end, I used (see the fiddle for the result):

SELECT
  created_date,
  LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id 
                            ORDER BY created_date DESC, origin_url, client_session_id) AS l_3,
  ABS(TIMESTAMPDIFF
  (
    MINUTE,
    created_date, 
    LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id 
                            ORDER BY created_date DESC, origin_url, client_session_id)
  )) AS min_diff,
  UNIX_TIMESTAMP(created_date) -
  UNIX_TIMESTAMP
  (
    LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id 
                            ORDER BY created_date DESC, origin_url, client_session_id)
  ) AS ut_d,
  origin_url,
  client_session_id
FROM test;

The final SQL and result are:

SELECT * FROM
(
  SELECT
    created_date,
    LEAD(created_date, 2) 
           OVER (PARTITION BY origin_url, client_session_id 
                   ORDER BY created_date DESC, origin_url, client_session_id) AS l_3,
    created_date -
    LEAD(created_date, 2) 
           OVER (PARTITION BY origin_url, client_session_id 
                   ORDER BY created_date DESC, origin_url, client_session_id) AS l_diff,
    UNIX_TIMESTAMP(created_date) -
    UNIX_TIMESTAMP
    (
      LEAD(created_date, 2) OVER (PARTITION BY origin_url, client_session_id 
                            ORDER BY created_date DESC, origin_url, client_session_id)
    ) AS ut_d,
    origin_url,
    client_session_id 
  FROM test
) AS t
WHERE t.ut_d  < 180;

Result:

         created_date                     l_3   l_diff  ut_d    origin_url  client_session_id
2021-01-18 11:51:05.0   2021-01-18 11:49:28.0    177.0  97.0    https://zendes.com/ 
znkjoc3gfth2c3m0t1klii

You mention that you have MySQL 5.7 - which doesn't support window functions. You can either upgrade or use various simulation methods - my advice is to upgrade.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit dba.stackexchange
scroll top