Question

I would like to write a function that converts a data frame into a matrix. The data frame is a list of events. Each row corresponds to a person visiting or buying the product.

my.df <- data.frame(person = c('A', 'A', 'B', 'B', 'B', 'C'),
                    week = c(1, 2, 1, 3, 3, 2),
                    event = c('visit', 'buy', 'visit', 'visit', 'buy', 'visit'))
> my.df
  person week event
1      A    1 visit
2      A    2   buy
3      B    1 visit
4      B    3 visit
5      B    3   buy
6      C    2 visit

The desired output matrix has the rows as the people, and the columns as the weeks. In the (person, week) entry I want to have "buy" if the person bought, and if not I want to have "visit" if the person visited, otherwise I want to have "none" as the entry. More concretely, the desired output is the following matrix:

> my.mat
  1       2      3      
A "visit" "buy"  "none" 
B "visit" "none" "buy"  
C "none"  "none" "visit"

I have the idea that I should convert the events into numbers, do a cast with a max, and then convert the numbers back into the events, but I am not totally sure how to put this all together.

Was it helpful?

Solution 3

Building on the answers of @eddi and @Rodrigo, I managed to find the following code that is a little verbose, but works. It also works if I want a more complicated ordering of events.

require(reshape2) # For acast(...)

# Input data frame
my.df <- data.frame(person = c('A', 'A', 'B', 'B', 'B', 'C'),
                    week = c(1, 2, 1, 3, 3, 2),
                    event = c('visit', 'buy', 'visit', 'visit', 'buy', 'visit'))

# Convert event into numbers, with buy > visit
the.levels <- c('visit', 'buy')
my.df$event <- as.numeric(factor(my.df$event, levels = the.levels))

# Build matrix
temp <- acast(my.df, person ~ week, function(x) {max(x)},
             value.var = 'event', fill = 0)

# Convert event numbers back into text
number.to.event <- as.list(setNames(c('none', 'visit', 'buy'),
                                as.character(c(0, 1, 2))))
# Save row names and column names
out <- matrix(number.to.event[as.character(temp)], nrow = 3,
              dimnames = dimnames(temp))

> out
  1       2       3     
A "visit" "buy"   "none"
B "visit" "none"  "buy" 
C "none"  "visit" "none"

OTHER TIPS

As Arun points out, use the reshape2 package:

library(reshape2)

# there is a variety of ways to get the precedence you like
# I chose to just sort the strings
acast(my.df, person ~ week, function(x) {sort(as.character(x))[1]},
      value.var = 'event', fill = 'none')
#  1       2       3     
#A "visit" "buy"   "none"
#B "visit" "none"  "buy" 
#C "none"  "visit" "none"

Just a piece of code:

unique(event)
as.numeric(factor(event))
unique(event)[as.numeric(factor(event)[1])]

The first line shows how many different events do you have. The second transforms your events into numbers. The third will give the text relative to the numbered element (1 here).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top