Background before we begin...

Table schema:

UserId | ActivityDate | Time_diff

where "ActivityDate" is timestamp of activity by user "Time_diff" is timestampdiff between the next activity and current activity in seconds in general, but for the last recorded activity of user, since there is no next activity I set the Time_diff to -999

Ex:

 UserId       | ActivityDate        | Time_diff
|           1 | 2012-11-10 11:19:04 |        12 |
|           1 | 2012-11-10 11:19:16 |        11 |
|           1 | 2012-11-10 11:19:27 |         3 |
|           1 | 2012-11-10 11:19:30 |    236774 |
|           1 | 2012-11-13 05:05:44 |        39 |
|           1 | 2012-11-13 05:06:23 |     77342 |
|           1 | 2012-11-14 02:35:25 |    585888 |
|           1 | 2012-11-20 21:20:13 |   1506130 |
   ...

|           1 | 2013-06-13 06:32:48 |   1616134 |
|           1 | 2013-07-01 23:28:22 |   5778459 |
|           1 | 2013-09-06 20:36:01 |      -999 |
|           2 | 2008-08-01 04:59:33 |       622 |
|           2 | 2008-08-01 05:09:55 |     38225 |
|           2 | 2008-08-01 15:47:00 |     31108 |
|           2 | 2008-08-02 00:25:28 |     28599 |
|           2 | 2008-08-02 08:22:07 |    163789 |
|           2 | 2008-08-04 05:51:56 |   1522915 |
|           2 | 2008-08-21 20:53:51 |    694678 |
|           2 | 2008-08-29 21:51:49 |   2945291 |
|           2 | 2008-10-03 00:00:00 |    172800 |
|           2 | 2008-10-05 00:00:00 |    776768 |
|           2 | 2008-10-13 23:46:08 |   3742999 |

I have just added the field "session_id"

alter table so_time_diff add column session_id int(11) not null;

My actual question...

I would like to update this field for each of the above records based on the following logic:

for first record: set session_id = 1
from second record:
    if previous_record.UserId == this_record.UserId AND previous_record.time_diff <=3600
         set this_record.session_id = previous_record.session_id
    else if previous_record.UserId == this_record.UserId AND previous_record.time_diff >3600
         set this_record.session_id = previous_record.session_id + 1
    else if previous_record.UserId <> this_record.UserId 
         set session_id = 1 ## for a different user, restart

In simple words,

if two records of the same user are within a time_interval of 3600 seconds, assign the same sessionid, if not increment the sessionid, if its a different user, restart the sessionid count.

I've never written logic in an update query before. Is this possible? Any guidance is greatly appreciated!

有帮助吗?

解决方案

Yes, this is possible. It would be easier if the time_diff was on the later record, rather than the previous record, but we can make it work. (We don't really need the stored time_diff.)

The "trick" to getting this to work is really writing a SELECT statement. If you've got a SELECT statement that returns the key of the row to be updated, and the values to be assigned, making that into an UPDATE is trivial.

The "trick" to getting a SELECT statement is to make use of MySQL user variables, and is dependent on non-guaranteed behavior of MySQL.

This is the skeleton of the statement:

SELECT @prev_userid                         AS prev_userid
     , @prev_activitydate                   AS prev_activitydate
     , @sessionid                           AS sessionid
     , @prev_userid := t.userid             AS userid
     , @prev_activitydate := t.activitydate AS activitydate
  FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
  JOIN so_time_diff t
 ORDER BY t.userid, t.activitydate

(We hope there's an index ON mytable (userid, activitydate), so the query can be satisfied from the index, without a need for an expensive "Using filesort" operation.)

Let's unpack that a bit. Firstly, the three MySQL user variables get initialized by the inline view aliased as i. We don't really care about what that returns, we only really care that it initializes the user variables. Because we're using it in a JOIN operation, we also care that it returns exactly one row.

When the first row is processed, we have the values that were previously assigned to the user variable, and we assign the values from the current row to them. When the next row is processed, the values from the previous row are in the user variables, and we assign the current row values to them, and so on.

The "ORDER BY" on the query is important; it's vital that we process the rows in the correct order.

But that's just a start.

The next step is comparing the userid and activitydate values of the current and previous rows, and deciding whether we're in the same sessionid, or whether its a different session, and we need to increment the sessionid by 1.

SELECT @sessionid := @sessionid +
       IF( t.userid = @prev_userid AND
           TIMESTAMPDIFF(SECOND,@prev_activitydate,t.activitydate) <= 3600
       ,0,1) AS sessionid
     , @prev_userid := t.userid             AS userid
     , @prev_activitydate := t.activitydate AS activitydate
  FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
  JOIN so_time_diff t
 ORDER BY t.userid, t.activitydate

You could make use of the value stored in the existing time_diff column, but you need the value from previous row when checking the current row, so that just be another MySQL user variable, a check of @prev_time_diff, rather than calculating the timestamp difference (as in my example above.) (We can add other expressions to the select list, to make debugging/verification easier...

     , @prev_userid=t.userid
     , TIMESTAMPDIFF(SECOND,@prev_activitydate,t.activitydate)

N.B. The ORDER of the expressions in the SELECT list is important; the expressions are evaluated in the order they appear... this wouldn't work if we were to assign the userid value from the current row to the user variable BEFORE we checked it... that's why those assignments come last in the SELECT list.

Once we have a query that looks good, that's returning a "sessionid" value that we want to assign to the row with a matching userid and activitydate, we can use that in a multitable update statement.

UPDATE (
         -- query that generates sessionid for userid, activityid goes here
       ) s
  JOIN so_time_diff t
    ON t.userid = s.userid
   AND t.activitydate = s.activity_date 
   SET t.sessionid = s.sessionid

(If there's a lot of rows, this could crank a very long time. With versions of MySQL prior to 5.6, I believe the derived table (aliased as s) won't have any indexes created on it. Hopefully, MySQL will use the derived table s as the driving table for the JOIN operation, and do index lookups to the target table.)


FOLLOWUP

I entirely missed the requirement to restart sessionid at 1 for each user. To do that, I'd modify the expression that's assigned to @sessionid, just split the condition tests of userid and activitydate. If the userid is different than the previous row, then return a 1. Otherwise, based on the comparison of activitydate, return either the current value of @sessionid, or the current value incremented by 1.

Like this:

SELECT @sessionid := 
       IF( t.userid = @prev_userid 
         , IF( TIMESTAMPDIFF(SECOND,@prev_activitydate,t.activitydate) <= 3600
             , @sessionid
             , @sessionid + 1 )
         , 1 ) 
       AS sessionid
     , @prev_userid := t.userid             AS userid
     , @prev_activitydate := t.activitydate AS activitydate
  FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
  JOIN so_time_diff t
 ORDER BY t.userid, t.activitydate

N.B. None of these statements is tested, these statements have only been desk checked; I've successfully used this pattern innumerable times.

其他提示

Here is what I wrote, and this worked!!!

SELECT @sessionid := @sessionid +
   CASE WHEN @prev_userid IS NULL THEN 0 
        WHEN t.UserId <> @prev_userid THEN 1-@sessionid
        WHEN t.UserId = @prev_userid AND
       TIMESTAMPDIFF(SECOND,@prev_activitydate,t.ActivityDate) <= 3600
       THEN 0 ELSE 1
   END          
  AS sessionid 
 , @prev_userid := t.UserId             AS UserId
 , @prev_activitydate := t.ActivityDate AS ActivityDate,
 time_diff
FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
JOIN example t
ORDER BY t.UserId, t.ActivityDate;

thanks again to @spencer7593 for your very descriptive answer giving me the right direction..!!!

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top