Question

I am trying to convert the following format:

mydata <- data.frame(movie = c("Titanic", "Departed"), 
                     actor1 = c("Leo", "Jack"), 
                     actor2 = c("Kate", "Leo"))

     movie actor1 actor2
1  Titanic    Leo   Kate
2 Departed   Jack    Leo

to binary response variables:

     movie Leo Kate Jack
1  Titanic   1    1    0
2 Departed   1    0    1

I tried the solution described in Convert row data to binary columns but I could get it to work for two variables, not three.

I would really appreciate if there is a clean way to do this.

Was it helpful?

Solution 4

An updated tidyr-based option is to convert to long-shape, use complete to fill in missing combinations of movies and actors, and then just convert a logical is.na test to a numeric value. Then reshape back to wide.

library(tidyr)

mydata %>%
  pivot_longer(starts_with("actor"), names_to = "acted") %>%
  complete(movie, value) %>%
  dplyr::mutate(acted = as.numeric(!is.na(acted))) %>%
  pivot_wider(names_from = value, values_from = acted)
#> # A tibble: 2 x 4
#>   movie     Jack   Leo  Kate
#>   <fct>    <dbl> <dbl> <dbl>
#> 1 Departed     1     1     0
#> 2 Titanic      0     1     1

OTHER TIPS

How much spice is too much? Here is a solution via tidyr:

library(dplyr)
library(tidyr)

mydata %>%
  gather(actor,name,starts_with("actor")) %>%
  mutate(present = 1) %>%
  select(-actor) %>%
  spread(name,present,fill = 0)

       movie Jack Kate Leo
 1 Departed    1    0   1
 2  Titanic    0    1   1

One way to reshape your data.frame is with the reshape2 package, using melt and dcast. For example:

library(reshape2)
long.mydata <- melt(mydata, id.vars = "movie")
wide.mydata <- dcast(long.mydata, movie ~ value, function(x) 1, fill = 0)

Pay attention to the fun.aggregate and fill parameters in dcast, which control what goes to fill in the interior after casting.

Since they say variety is the spice of life, here's an approach in base R using table:

table(cbind(mydata[1], 
            actor = unlist(mydata[-1], use.names=FALSE)))
#           actor
# movie      Jack Leo Kate
#   Departed    1   1    0
#   Titanic     0   1    1

The above output is a matrix of class table. To get a data.frame, use as.data.frame.matrix.

as.data.frame.matrix(table(
  cbind(mydata[1], actor = unlist(mydata[-1], use.names=FALSE))))
#          Jack Leo Kate
# Departed    1   1    0
# Titanic     0   1    1

The reshape2-package has also the recast-function.

The code:

library(reshape2)
recast(mydata, id.var = 'movie', movie ~ value, fun.aggregate = length)

The result:

     movie Jack Kate Leo
1 Departed    1    0   1
2  Titanic    0    1   1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top