Question

I have a quite large dataframe structured like this:

id    x1    x2    x3    y1    y2    y3    z1    z2    z3     v 
 1     2     4     5    10    20    15   200   150   170   2.5
 2     3     7     6    25    35    40   300   350   400   4.2

I need to create a dataframe like this:

id   xsource   xvalue   yvalue   zvalue       v 
 1        x1        2       10      200     2.5
 1        x2        4       20      150     2.5
 1        x3        5       15      170     2.5
 2        x1        3       25      300     4.2
 2        x2        7       35      350     4.2
 2        x3        6       40      400     4.2

I'm quite sure I have to do it with the reshape package, but I'm not able to get what I want.

Could you help me?

Thanks

Was it helpful?

Solution

Here's the reshape() solution.

The key bit is that the varying= argument can take a list of vectors of column names in the wide format that correspond to single variables in the long format. In this case, columns "x1", "x2", "x3" in the original data frame are sent to one column in the long data frame, columns "y1, y2, y3" will go into a second column, and so on.

# Read in the original data, x, from Andrie's answer

res <- reshape(x, direction = "long", idvar = "id",
               varying = list(c("x1","x2", "x3"), 
                              c("y1", "y2", "y3"), 
                              c("z1", "z2", "z3")),
               v.names = c("xvalue", "yvalue", "zvalue"), 
               timevar = "xsource", times = c("x1", "x2", "x3"))
#      id   v xsource xvalue yvalue zvalue
# 1.x1  1 2.5      x1      2     10    200
# 2.x1  2 4.2      x1      3     25    300
# 1.x2  1 2.5      x2      4     20    150
# 2.x2  2 4.2      x2      7     35    350
# 1.x3  1 2.5      x3      5     15    170
# 2.x3  2 4.2      x3      6     40    400

Finally, a couple of purely cosmetic steps are needed to get the results looking exactly as shown in your question:

res <- res[order(res$id, res$xsource), c(1,3,4,5,6,2)]
row.names(res) <- NULL
res
#   id xsource xvalue yvalue zvalue   v
# 1  1      x1      2     10    200 2.5
# 2  1      x2      4     20    150 2.5
# 3  1      x3      5     15    170 2.5
# 4  2      x1      3     25    300 4.2
# 5  2      x2      7     35    350 4.2
# 6  2      x3      6     40    400 4.2

OTHER TIPS

Here's one approach that use reshape2 and is described in depth in my paper on tidy data.

Step 1: identify the variables that are already in columns. In this case: id, and v. These are the variables we melt by

library(reshape2)
xm <- melt(x, c("id", "v"))

Step 2: split up variables that are currently combined in one column. In this case that's source (the character part) and rep (the integer part):

There are lots of ways to do this, I'm going to use string extraction with the stringr package

library(stringr)
xm$source <- str_sub(xm$variable, 1, 1)
xm$rep <- str_sub(xm$variable, 2, 2)
xm$variable <- NULL

Step 3: rearrange the variables that currently in the rows but we want in columns:

dcast(xm, ... ~ source)

#   id   v rep x  y   z
# 1  1 2.5     1 2 10 200
# 2  1 2.5     2 4 20 150
# 3  1 2.5     3 5 15 170
# 4  2 4.2     1 3 25 300
# 5  2 4.2     2 7 35 350
# 6  2 4.2     3 6 40 400

Somebody please prove me wrong, but I don't think it's easy to solve this problem using either the reshape package or the base reshape function.

However, it's easy enough using lapply and do.call:

Replicate the data:

x <- read.table(text="
id    x1    x2    x3    y1    y2    y3    z1    z2    z3     v 
1     2     4     5    10    20    15   200   150   170   2.5
2     3     7     6    25    35    40   300   350   400   4.2
", header=TRUE)

Do the analysis

chunks <- lapply(1:nrow(x), 
    function(i)cbind(x[i, 1], 1:3, matrix(x[i, 2:10], ncol=3), x[i, 11]))
res <- do.call(rbind, chunks)
colnames(res) <- c("id", "source", "x", "y", "z", "v")
res

     id source x y  z   v  
[1,] 1  1      2 10 200 2.5
[2,] 1  2      4 20 150 2.5
[3,] 1  3      5 15 170 2.5
[4,] 2  1      3 25 300 4.2
[5,] 2  2      7 35 350 4.2
[6,] 2  3      6 40 400 4.2

Try using the reshapeGUI package. It utilizes the plyr package and the reshape2 package and it provides you with an easy to use interface that allows you to preview your reshape before you execute it. It also gives you the code for the reshape that you're doing so you can paste it into your script for reproducability and so you can learn to use the melt and cast commands in reshape2. It's a nice crutch for complex data manipulations like this one for those who aren't reshape ninjas.

Here are two more recent approaches that might be of interest to someone reading this question:

Option 1: The tidyverse

library(tidyverse)
x %>% 
  gather(var, val, -id, -v) %>% 
  extract(var, into = c("header", "source"), regex = "([a-z])([0-9])") %>% 
  spread(header, val)
#   id   v source x  y   z
# 1  1 2.5      1 2 10 200
# 2  1 2.5      2 4 20 150
# 3  1 2.5      3 5 15 170
# 4  2 4.2      1 3 25 300
# 5  2 4.2      2 7 35 350
# 6  2 4.2      3 6 40 400

Option 2: data.table

library(data.table)
setDT(x)
melt(x, measure.vars = patterns("x", "y", "z"), 
     value.name = c("x", "y", "z"), 
     variable.name = "source")
#    id   v source x  y   z
# 1:  1 2.5      1 2 10 200
# 2:  2 4.2      1 3 25 300
# 3:  1 2.5      2 4 20 150
# 4:  2 4.2      2 7 35 350
# 5:  1 2.5      3 5 15 170
# 6:  2 4.2      3 6 40 400
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top