Domanda

Having the following table which comprises some key columns which are: customer ID | order ID | product ID | Quantity | Amount | Order Date.

All this data is in LONG Format, in that you will get multi line items for the 1 Customer ID.

I can get the first date last date using R DateDiff but converting the file to WIDE format using Plyr, still end up with the same problem of getting multiple orders by customer, just less rows and more columns.

Is there an R function that extends R DateDiff to work out how to get the time interval between purchases by Customer ID? That is, time between order 1 and 2, order 2 and 3, and so on assuming these orders exists.

CID     Order.Date  Order.DateMY    Order.No_    Amount Quantity  Category.Name    Locality
1       26/02/13    Feb-13          zzzzz                   1       r                 MOSMAN
1       26/05/13    May-13          qqqqq                   1       x               CHULLORA
1       28/05/13    May-13           wwwww                  1       r               MOSMAN
1       28/05/13    May-13           wwwww                  1       x                 MOSMAN
2       19/08/13    Aug-13          wwwwww                  1       o                OAKLEIGH SOUTH
3       3/01/13    Jan-13           wwwwww                  1       x                 CURRENCY CREEK
4       28/08/13    Aug-13         eeeeeee                  1       t                 BRISBANE
4       10/09/13    Sep-13         rrrrrrrrr                1       y               BRISBANE
4       25/09/13    Sep-13         tttttttt                 2       e               BRISBANE
È stato utile?

Soluzione 2

Split the data frame and find the intervals for each Customer ID.

df <- data.frame(customerID=as.factor(c(rep("A",3),rep("B",4))),
OrderDate=as.Date(c("2013-07-01","2013-07-02","2013-07-03","2013-06-01","2013-06-02",
"2013-06-03","2013-07-01")))

dfs <- split(df,df$customerID)
lapply(dfs,function(x){
tmp <-diff(x$OrderDate)
tmp
})

Or use plyr

library(plyr)
dfs <- dlply(df,.(customerID),function(x)return(diff(x$OrderDate)))

Altri suggerimenti

It is not clear what do you want to do since you don't give the expected result. But I guess you want to the the intervals between 2 orders.

library(data.table)
DT <- as.data.table(DF)
DT[, list(Order.Date,
          diff = c(0,diff(sort(as.Date(Order.Date,'%d/%m/%y')))) ),CID]

   CID Order.Date diff
1:   1   26/02/13    0
2:   1   26/05/13   89
3:   1   28/05/13    2
4:   1   28/05/13    0
5:   2   19/08/13    0
6:   3    3/01/13    0
7:   4   28/08/13    0
8:   4   10/09/13   13
9:   4   25/09/13   15

I know this question is very old, but I just figured out another way to do it and wanted to record it:

> library(dplyr)
> library(lubridate)
> df %>% group_by(customerID) %>% 
    mutate(SinceLast=(interval(ymd(lag(OrderDate)),ymd(OrderDate)))/86400)

# A tibble: 7 x 3
# Groups:   customerID [2]
  customerID OrderDate  SinceLast
  <fct>      <date>         <dbl>
1 A          2013-07-01       NA 
2 A          2013-07-02        1.
3 A          2013-07-03        1.
4 B          2013-06-01       NA 
5 B          2013-06-02        1.
6 B          2013-06-03        1.
7 B          2013-07-01       28.
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top