Pregunta

I have read many of the threads and do not think my question has been asked before. I have a data.frame in R related to advertisements shown to customers as such:.. I have many customers, 8 different products.. so this is just a sample

mydf <- data.frame(Cust = c(1, 1), age = c(24, 24), 
    state = c("NJ", "NJ"), Product = c(1, 1), cost = c(400, 410), 
    Time = c(35, 25), Purchased = c("N", "Y"))
mydf
#   Cust age state Product cost Time Purchased
# 1    1  24    NJ       1  400   35         N
# 2    1  24    NJ       1  410   23         Y

And I want to transform it to look as such ...

Cust | age | state | Product | cost.1 | time.1 | purch.1 | cost.2 | time.2 | purch.2
   1 |  24 |    NJ |       1 |    400 |     35 |       N |    410 |     23 |       Y

How can I do this? There are a few static variables for each customer such as age, state and a few others... and then there are the details associated with each offer that was presented to a given customer, the product # in the offer, the cost, the time, and if they purchased it... I want to get all of this onto 1 line for each customer to perform analysis.

It is worth noting that the number of products maxes out at 7, but for some customers it ranges from 1 to 7.

I have no sample code to really show. I have tried using the aggregate function, but I do not want to aggregate, or do any SUMs. I just want to do some joins. Research suggests the cbind, and tapply functions may be useful.

Thank you for your help. I am very new to R.

¿Fue útil?

Solución

You are essentially asking to do a "long" to "wide" reshape of your data.

It looks to me like you're using "Cust", "age", "state", and "Product" as your ID variables. You don't have a an actual "time" variable though ("time" as in the sequential count of records by the IDs mentioned above). However, such a variable is easy to create:

mydf$timevar <- with(mydf, 
                     ave(rep(1, nrow(mydf)), 
                         Cust, age, state, Product, FUN = seq_along))
mydf
#   Cust age state Product cost Time Purchased timevar
# 1    1  24    NJ       1  400   35         N       1
# 2    1  24    NJ       1  410   23         Y       2

From there, this is pretty straightforward with the reshape function in base R.

reshape(mydf, direction = "wide", 
        idvar=c("Cust", "age", "state", "Product"),
        timevar = "timevar")
#   Cust age state Product cost.1 Time.1 Purchased.1 cost.2 Time.2 Purchased.2
# 1    1  24    NJ       1    400     35           N    410     23           Y
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top