reformat data in r

https://stackoverflow.com/questions/11988746

26-06-2021
|

Domanda

I have following type of data, although level of set and indvidual is quite high in real dataset:

set <- c(rep(1,6), rep(2,6))
Indvidual <- c(rep (c("IndvA", "IndvA", "IndvB", "IndvB", "IndvC", "IndvC"), 2))
leftposition <- c(10,  10,0 ,0,  0,   0,  40,     40,    30,  30,  20,  20 )
rightposition <- c(20,  20,20,20,  30,  30, 50,     50,    40,  40,  60,  60 )
leftmark <- c(     1, 3, 5,    7,    9,   11,  13,       15,     17,   19,   21 , 23 )
rightmark <- c( 2,    4,     6,    8,    10,   12,14,      16,  18,   20,   22,  24 )

myd <- data.frame (set, Indvidual,leftposition,rightposition, leftmark, rightmark)
myd

      set Indvidual leftposition rightposition leftmark rightmark
1    1     IndvA           10            20        1         2
2    1     IndvA           10            20        3         4
3    1     IndvB            0            20        5         6
4    1     IndvB            0            20        7         8
5    1     IndvC            0            30        9        10
6    1     IndvC            0            30       11        12
7    2     IndvA           40            50       13        14
8    2     IndvA           40            50       15        16
9    2     IndvB           30            40       17        18
10   2     IndvB           30            40       19        20
11   2     IndvC           20            60       21        22
12   2     IndvC           20            60       23        24

In the new dataset beside first column being Individual, the rest column will be all unique(leftpostion, rightposition)

sort (unique (c(leftposition, rightposition)))
[1]  0 10 20 30 40 50 60

Now for the set = 1, I want add values for Individuals (note the each Indvidual has been repeated twice, is expected). Each Individual has two values - one added to left (leftposition) another added to right (rightposition). The actual data to be printed to left or right are in leftmark and rightmark respectively. Thus for the first set the organized data would look like the following:

enter image description here

Then the set2 (or n set data) will be added to same table. Anything blank at the end will be filled with NA or any value specified (such as "-").

enter image description here

Your help is appreciated:

Soluzione

library(reshape2)
library(plyr)
#Make indviduals unique
myd <- ddply(myd, .(set, Indvidual), transform, 
             Indvidual = paste(Indvidual, order(Indvidual), sep = "_"))

# bind positions together

myd_molten <- melt(myd, id.vars=c("set", "Indvidual"))
marks <- grep("mark", myd_molten$variable)
levels(myd_molten$variable) <- rep(c("left", "right"), 2)
myd_pos <- myd_molten[-marks,]
names(myd_pos)[4] <- "position"
myd_mark <- myd_molten[marks,]
myd_binded <- cbind(myd_pos, mark = myd_mark$value)

#cast it into the desired form and get the names right
#I could have done the names with gsub but I didn't want to mess with regexpr

ans <- dcast(Indvidual ~ position, value.var = "mark", data = myd_binded)
ans$Indvidual <- do.call(rbind, strsplit(ans$Indvidual, "_"))[,1]
ans

  Indvidual  0 10 20 30 40 50 60
1     IndvA NA  1  2 NA 13 14 NA
2     IndvA NA  3  4 NA 15 16 NA
3     IndvB  5 NA  6 17 18 NA NA
4     IndvB  7 NA  8 19 20 NA NA
5     IndvC  9 NA 21 10 NA NA 22
6     IndvC 11 NA 23 12 NA NA 24

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow