Pregunta

I have a table that stores all messages between users and a bot (basically a state machine), and I'm trying to find all pairs of message/response from this table, in order to calculate each user's average response time. The caveat is, that not all outgoing messages get a response.

Each row stores message_id, user_id, created_at (timestamp), state_code and outgoing (boolean).

I have been looking at window functions, with the intention of using lag and lead to find the relevant pairs of messages and then calculate the difference between their created_at values, which averaged over each user would give us each user's avg. response time. The problem with this is that I have no way of assuring that both messages were issued with the same sate_code. Ideas?

UPDATE: you can assure that a user's message is a response to a given outgoing message if they have the same state code. So, for example

╔════════════╦═════════╦════════════╦════════════╦══════════╗
║ message_id ║ user_id ║ created_at ║ state_code ║ outgoing ║
╠════════════╬═════════╬════════════╬════════════╬══════════╣
║          1 ║      11 ║ mm/dd/yy   ║         20 ║ t        ║
║          2 ║      11 ║ mm/dd/yy   ║         20 ║ f        ║
║          3 ║      11 ║ mm/dd/yy   ║         22 ║ t        ║
║          4 ║      11 ║ mm/dd/yy   ║         21 ║ t        ║
║          5 ║      12 ║ mm/dd/yy   ║         45 ║ t        ║
║          6 ║      12 ║ mm/dd/yy   ║         46 ║ f        ║
║          7 ║      12 ║ mm/dd/yy   ║         46 ║ t        ║
║          8 ║      12 ║ mm/dd/yy   ║         20 ║ f        ║
║          9 ║      12 ║ mm/dd/yy   ║         43 ║ t        ║
║         10 ║      13 ║ mm/dd/yy   ║         20 ║ t        ║
╚════════════╩═════════╩════════════╩════════════╩══════════╝

In this case the pairs are, messages 1 and 2, and messages 6 and 7. Nevertheless, only messages 1 and 2 matter, since user 1 is responding from state 20 to one of our outgoing messages received while in state 20.

¿Fue útil?

Solución

If I understand correctly, then each time the outgoing is false, you want the created_at from preceding row with the same user_id and state_code.

I'm not sure how you would use the windows functions for this. Here is an approach using correlated subqueries:

Here is one way:

select t.*,
       (select created_at
        from t t2
        where t2.user_id = t.user_id and
              t2.state_code = t.state_code and
              t2.outgoing = 't' and
              t2.created_at < t.created_at
        order by t2.created_at desc
        limit 1
       ) as prev_created_at
 from t

You can then do your date arithmetic to get what you want (mostly that prev_created_at is not NULL).

If you know that the previous row is the "one just before", you can do something similar with lag():

 select t.*
 from (select t.*,
              lag(created_at) over (partition by user_id, state_code order by created_at) as prev_created_at,
              lag(outgoing) over (partition by user_id, state_code order by created_at) as prev_outgoing
       from t
      ) t
where t.outgoing = 'f' and t.prev_outgoing = 't';
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top