Question

I would like to reshape my data based in unique string in a "Bull" column (all data frame):

EBV       Bulls
0.13    NE001362
0.17    NE001361
0.05    NE001378
-0.12   NE001359
-0.14   NE001379
0.13    NE001380
-0.46   NE001379
-0.46   NE001359
-0.68   NE001394
0.28    NE001391
0.84    NE001394
-0.43   NE001393
-0.18   NE001707

My expected output:

NE001362    NE001361    NE001378    NE001359    NE001379    NE001380    NE001394    NE001391    NE001393    NE001707
  0.13        0.17        0.05       -0.12       -0.14        0.13       -0.68        0.28       -0.43       -0.18
                                     -0.46       -0.46                    0.84          

I tried dat2 <- dcast(all, EBV~variable, value.var = "Bulls") but do not works.

Was it helpful?

Solution

You have two options. Indexing the multiple occurrences for each level of Bulls or using a list to hold the different levels of EBV.

Option 1: Indexing multiple occurrences

You can use data.table to generate an index that numbers multiple occurrences of EBV:

require(data.table)
setDT(all)                    ## convert to data.table
all[, index:=1:.N, by=Bulls]  ## generate index
dcast.data.table(all, formula=index ~ Bulls, value.var='EBV') 

Option 2: Using a list to store multiple values

You could use a list as a value with data.table (I'm not sure if plain data.frame supports it).

require(data.table)
setDT(all)                       ## convert to data.table
all[, list(list(EBV)), by=Bulls] ## multiple values stored as list

OTHER TIPS

Just to make sure that base R gets some acknowledgement:

## Add an ID, like ilir did, but with base R functions
mydf$ID <- with(mydf, ave(rep(1, nrow(mydf)), Bulls, FUN = seq_along))

Here's reshape:

reshape(mydf, direction = "wide", idvar="ID", timevar="Bulls")
#   ID EBV.NE001362 EBV.NE001361 EBV.NE001378 EBV.NE001359 EBV.NE001379
# 1  1         0.13         0.17         0.05        -0.12        -0.14
# 7  2           NA           NA           NA        -0.46        -0.46
#   EBV.NE001380 EBV.NE001394 EBV.NE001391 EBV.NE001393 EBV.NE001707
# 1         0.13        -0.68         0.28        -0.43        -0.18
# 7           NA         0.84           NA           NA           NA

And xtabs. Note: This is a table-like matrix, so if you want a data.frame, you'll have to use as.data.frame.matrix on the output.

xtabs(EBV ~ ID + Bulls, mydf)
#    Bulls
# ID  NE001359 NE001361 NE001362 NE001378 NE001379 NE001380 NE001391
#   1    -0.12     0.17     0.13     0.05    -0.14     0.13     0.28
#   2    -0.46     0.00     0.00     0.00    -0.46     0.00     0.00
#    Bulls
# ID  NE001393 NE001394 NE001707
#   1    -0.43    -0.68    -0.18
#   2     0.00     0.84     0.00
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top