Domanda

I have some data about when, how long, and what channel people are listening to the radio. I need to make a variable called sessions that groups all entries which occur while the radio is on. Because the data may contain some errors I would like to say that if less than five minutes passes from the end of one channel period to the next then it is still the same session. Hopefully a brief example will clarify.

  obs  Entry_date   Entry_time  duration(in secs) channel
   1    01/01/12      23:25:21    6000               2
   2    01/03/12      01:05:64     300               5
   3    01/05/12      12:12:35     456               5
   4    01/05/12      16:45:21     657               8

I want to create the variable sessions so that

obs  Entry_date   Entry_time  duration(in secs) channel   session
   1    01/01/12      23:25:21    6000               2    1
   2    01/03/12      01:05:64     300               5    1
   3    01/05/12      12:12:35     456               5    2
   4    01/05/12      16:45:21     657               8    3

for defining 1 session i need to use entry_time (and date if it goes from 11pm into the next morning) so that if entry_time+duration + (5minutes) < entry_time(next channel) then the session changes. This has been killing me and simple arrays wont do the trick, or my attempt using arrays has not worked. Thanks in advance

È stato utile?

Soluzione

Aside from the comments I made in the OP, here's how I would do it using a SAS data step. I've changed the date and time values for row 2 to what I suspect they should be (in order to get the same result as in the OP). This avoids having to perform a self join, which is likely to be performance intensive on a large dataset.
I've used the DIF and LAG functions, so care needs to be taken if you're adding in extra code (particularly IF statements).

data have;
input entry_date :mmddyy10. entry_time :time. duration channel;
format entry_date date9. entry_time time.;
datalines;
01/01/2012 23:25:21 6000 2
01/02/2012 01:05:54 300 5
01/05/2012 12:12:35 456 5
01/05/2012 16:45:21 657 8
;
run;

data want;
set have;
by entry_date entry_time; /* put in to check data is sorted correctly */
retain session 1; /* initialise session with value 1 */
session+(dif(dhms(entry_date,0,0,entry_time))-lag(duration)>300); /* increment session by 1 if time difference > 5 minutes */
run;

Altri suggerimenti

hopefully I got your requirements right! Since you need to base result on adjoining rows, there is a need to join a table to itself. The Session #s are not consecutive, but you should get the point.

 create table #temp
 (obs int not null,
entry_date datetime not null,
duration int not null,
channel int not null)


--obs  Entry_date   Entry_time  duration(in secs) channel
insert #temp
select   1, '01/01/12 23:25:21', 6000, 2
 union all select 2, '01/03/12 01:05:54', 300, 5
 union all select 3, '01/05/12 12:12:35', 456, 5
 union all select 4, '01/05/12 16:45:21', 657, 8

select a.obs,
       a.entry_date,
       a.duration,
endSession = dateadd(mi,5,dateadd(mi,a.duration,a.entry_date)),
a.channel,
b.entry_date,
minOverlapping = datediff(mi,b.entry_date,
                          dateadd(mi,5,dateadd(mi,a.duration,a.entry_date))),
anotherSession = case 
          when dateadd(mi,5,dateadd(mi,a.duration,a.entry_date))<b.entry_date
    then b.obs
    else a.obs end
from #temp a
  left join #temp b on a.obs = b.obs - 1

hope this helps a bit

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top