Pergunta

I have some data about when, how long, and what channel people are listening to the radio. I need to make a variable called sessions that groups all entries which occur while the radio is on. Because the data may contain some errors I would like to say that if less than five minutes passes from the end of one channel period to the next then it is still the same session. Hopefully a brief example will clarify.

  obs  Entry_date   Entry_time  duration(in secs) channel
   1    01/01/12      23:25:21    6000               2
   2    01/03/12      01:05:64     300               5
   3    01/05/12      12:12:35     456               5
   4    01/05/12      16:45:21     657               8

I want to create the variable sessions so that

obs  Entry_date   Entry_time  duration(in secs) channel   session
   1    01/01/12      23:25:21    6000               2    1
   2    01/03/12      01:05:64     300               5    1
   3    01/05/12      12:12:35     456               5    2
   4    01/05/12      16:45:21     657               8    3

for defining 1 session i need to use entry_time (and date if it goes from 11pm into the next morning) so that if entry_time+duration + (5minutes) < entry_time(next channel) then the session changes. This has been killing me and simple arrays wont do the trick, or my attempt using arrays has not worked. Thanks in advance

Foi útil?

Solução

Aside from the comments I made in the OP, here's how I would do it using a SAS data step. I've changed the date and time values for row 2 to what I suspect they should be (in order to get the same result as in the OP). This avoids having to perform a self join, which is likely to be performance intensive on a large dataset.
I've used the DIF and LAG functions, so care needs to be taken if you're adding in extra code (particularly IF statements).

data have;
input entry_date :mmddyy10. entry_time :time. duration channel;
format entry_date date9. entry_time time.;
datalines;
01/01/2012 23:25:21 6000 2
01/02/2012 01:05:54 300 5
01/05/2012 12:12:35 456 5
01/05/2012 16:45:21 657 8
;
run;

data want;
set have;
by entry_date entry_time; /* put in to check data is sorted correctly */
retain session 1; /* initialise session with value 1 */
session+(dif(dhms(entry_date,0,0,entry_time))-lag(duration)>300); /* increment session by 1 if time difference > 5 minutes */
run;

Outras dicas

hopefully I got your requirements right! Since you need to base result on adjoining rows, there is a need to join a table to itself. The Session #s are not consecutive, but you should get the point.

 create table #temp
 (obs int not null,
entry_date datetime not null,
duration int not null,
channel int not null)


--obs  Entry_date   Entry_time  duration(in secs) channel
insert #temp
select   1, '01/01/12 23:25:21', 6000, 2
 union all select 2, '01/03/12 01:05:54', 300, 5
 union all select 3, '01/05/12 12:12:35', 456, 5
 union all select 4, '01/05/12 16:45:21', 657, 8

select a.obs,
       a.entry_date,
       a.duration,
endSession = dateadd(mi,5,dateadd(mi,a.duration,a.entry_date)),
a.channel,
b.entry_date,
minOverlapping = datediff(mi,b.entry_date,
                          dateadd(mi,5,dateadd(mi,a.duration,a.entry_date))),
anotherSession = case 
          when dateadd(mi,5,dateadd(mi,a.duration,a.entry_date))<b.entry_date
    then b.obs
    else a.obs end
from #temp a
  left join #temp b on a.obs = b.obs - 1

hope this helps a bit

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top