Yes, this is possible. It would be easier if the time_diff was on the later record, rather than the previous record, but we can make it work. (We don't really need the stored time_diff.)
The "trick" to getting this to work is really writing a SELECT statement. If you've got a SELECT statement that returns the key of the row to be updated, and the values to be assigned, making that into an UPDATE is trivial.
The "trick" to getting a SELECT statement is to make use of MySQL user variables, and is dependent on non-guaranteed behavior of MySQL.
This is the skeleton of the statement:
SELECT @prev_userid AS prev_userid
, @prev_activitydate AS prev_activitydate
, @sessionid AS sessionid
, @prev_userid := t.userid AS userid
, @prev_activitydate := t.activitydate AS activitydate
FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
JOIN so_time_diff t
ORDER BY t.userid, t.activitydate
(We hope there's an index ON mytable (userid, activitydate)
, so the query can be satisfied from the index, without a need for an expensive "Using filesort" operation.)
Let's unpack that a bit. Firstly, the three MySQL user variables get initialized by the inline view aliased as i
. We don't really care about what that returns, we only really care that it initializes the user variables. Because we're using it in a JOIN operation, we also care that it returns exactly one row.
When the first row is processed, we have the values that were previously assigned to the user variable, and we assign the values from the current row to them. When the next row is processed, the values from the previous row are in the user variables, and we assign the current row values to them, and so on.
The "ORDER BY" on the query is important; it's vital that we process the rows in the correct order.
But that's just a start.
The next step is comparing the userid and activitydate values of the current and previous rows, and deciding whether we're in the same sessionid, or whether its a different session, and we need to increment the sessionid by 1.
SELECT @sessionid := @sessionid +
IF( t.userid = @prev_userid AND
TIMESTAMPDIFF(SECOND,@prev_activitydate,t.activitydate) <= 3600
,0,1) AS sessionid
, @prev_userid := t.userid AS userid
, @prev_activitydate := t.activitydate AS activitydate
FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
JOIN so_time_diff t
ORDER BY t.userid, t.activitydate
You could make use of the value stored in the existing time_diff
column, but you need the value from previous row when checking the current row, so that just be another MySQL user variable, a check of @prev_time_diff, rather than calculating the timestamp difference (as in my example above.) (We can add other expressions to the select list, to make debugging/verification easier...
, @prev_userid=t.userid
, TIMESTAMPDIFF(SECOND,@prev_activitydate,t.activitydate)
N.B. The ORDER of the expressions in the SELECT list is important; the expressions are evaluated in the order they appear... this wouldn't work if we were to assign the userid value from the current row to the user variable BEFORE we checked it... that's why those assignments come last in the SELECT list.
Once we have a query that looks good, that's returning a "sessionid" value that we want to assign to the row with a matching userid and activitydate, we can use that in a multitable update statement.
UPDATE (
-- query that generates sessionid for userid, activityid goes here
) s
JOIN so_time_diff t
ON t.userid = s.userid
AND t.activitydate = s.activity_date
SET t.sessionid = s.sessionid
(If there's a lot of rows, this could crank a very long time. With versions of MySQL prior to 5.6, I believe the derived table (aliased as s
) won't have any indexes created on it. Hopefully, MySQL will use the derived table s
as the driving table for the JOIN operation, and do index lookups to the target table.)
FOLLOWUP
I entirely missed the requirement to restart sessionid at 1 for each user. To do that, I'd modify the expression that's assigned to @sessionid, just split the condition tests of userid and activitydate. If the userid is different than the previous row, then return a 1. Otherwise, based on the comparison of activitydate, return either the current value of @sessionid, or the current value incremented by 1.
Like this:
SELECT @sessionid :=
IF( t.userid = @prev_userid
, IF( TIMESTAMPDIFF(SECOND,@prev_activitydate,t.activitydate) <= 3600
, @sessionid
, @sessionid + 1 )
, 1 )
AS sessionid
, @prev_userid := t.userid AS userid
, @prev_activitydate := t.activitydate AS activitydate
FROM (SELECT @prev_userid := NULL, @prev_activitydate := NULL, @sessionid := 1) i
JOIN so_time_diff t
ORDER BY t.userid, t.activitydate
N.B. None of these statements is tested, these statements have only been desk checked; I've successfully used this pattern innumerable times.