Reshape from long to wide data format and matching start to end date pairs in R

StackOverflow https://stackoverflow.com/questions/22973953

  •  30-06-2023
  •  | 
  •  

Pergunta

I have a dataframe of student enrollment records (transactions) that are in long format.

Sample:

ID   Date     Type
123  2/1/14   Entry
123  2/5/14   Exit
123  3/1/14   Entry
123  4/4/14   Exit
234  3/2/14   Entry
234  3/20/14  Exit
234  4/3/14   Entry

And I need to convert to wide format by matching pairs of entry and exit records.

Sample:

ID   Entry.Date   Exit.Date
123  2/1/14       2/5/14
123  3/1/14       4/4/14
234  3/2/14       3/20/14
234  4/3/14

There's nothing inherent in the data that I can use to key together the starting record with the ending record. It's simply ordered by student and then date. Some records are open ended (no matching exit record).

I'm looking at some of the conversion functions such as reshape but don't know if/how I can use those to convert to wide format and limit it to the date range pair. Would you recommend one of those or should I pursue something less elegant? Thanks!

Foi útil?

Solução

Here's one way using data.table. The idea is to group by ID, Type and add an additional column that identifies the Entry/Exit pairs. This is assuming that the data always has the right Entry/Exit pair adjacent to each other, except where either one is missing.

require(data.table) ## >= 1.9.0
setDT(dat)          ## dat is your data. converted to data.table now.

dat[, ID2 := seq_len(.N), by=list(ID, Type)]
# dat 
#     ID    Date  Type ID2
# 1: 123  2/1/14 Entry   1
# 2: 123  2/5/14  Exit   1
# 3: 123  3/1/14 Entry   2
# 4: 123  4/4/14  Exit   2
# 5: 234  3/2/14 Entry   1
# 6: 234 3/20/14  Exit   1
# 7: 234  4/3/14 Entry   2

Now cast it to wide format using dcast. Of course you can also use it from reshape2. But data.table has it's own implementation now and is faster, so I'll use it here.

dcast.data.table(dat, ID + ID2 ~ Type, value.var="Date")
#     ID ID2  Entry    Exit
# 1: 123   1 2/1/14  2/5/14
# 2: 123   2 3/1/14  4/4/14
# 3: 234   1 3/2/14 3/20/14
# 4: 234   2 4/3/14      NA

HTH

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top