Pregunta

I have a dfrm of over 100 columns and 150 rows. I need to merge the contents of every 4 columns to 1 (preferably separated by a "/", although dispensable) which is simple enough, performing apply(dfrm[ ,1:4], 1, paste, collapse="/"). I have difficulties scaling that solution to my whole df. In other words:

How can I go from this:

        loc1   loc1.1 loc1.2 loc1.3 loc2  loc2.1 loc2.2  loc2.3
ind.1    257    262    228    266    204    245    282    132
ind.2    244    115    240    187    196    133    189    251
ind.3    298    139    216    225    219    276    192    254
ind.4    129    176    180    182    215    250    227    186
ind.5    238    217    284    240    131    184    247    168

To something like this:

                 loc1            loc2
ind.1 257/262/228/266 204/245/282/132
ind.2 244/115/240/187 196/133/189/251
ind.3 298/139/216/225 219/276/192/254
ind.4 129/176/180/182 215/250/227/186
ind.5 238/217/284/240 131/184/247/168

In a dataframe of over 100 rows and columns. I've tried indexing the data frame as presented in the solution of this question, but after creating said index of every 4 columns y do find myself lost while trying to perform do.call over my data frame. I'm sure there must be a easy solution for this, but please keep in mind that i'm all but proficient in R.

Also; the colnames are not a real problem if the body is in shape, since extracting a list of names is performed by loc <- colnames(dfrm) and loc <- loc[c(T, F, F, F), and then defining colnames(dfrm) <- loc, although would be nice if incorporated.

¿Fue útil?

Solución

This is certainly not pretty, but it works:

do.call(cbind, lapply(1:ceiling(ncol(df)/4), function(i)
                      apply(df[,seq(4*(i-1)+1, min(4*i, ncol(df))), drop = F],
                            1, paste, collapse = "/")))
#      [,1]              [,2]             
#ind.1 "257/262/228/266" "204/245/282/132"
#ind.2 "244/115/240/187" "196/133/189/251"
#ind.3 "298/139/216/225" "219/276/192/254"
#ind.4 "129/176/180/182" "215/250/227/186"
#ind.5 "238/217/284/240" "131/184/247/168"

The ceiling and drop are there to survive edge cases when number of columns is not divisible by 4. Also, note that the end result is a matrix here (thanks to the apply), and you can convert it back to data.frame if you like (and assign whatever column names).

Otros consejos

Way late to the party, but I think this is a little cleaner (and robust to non multiple of 4 column counts):

as.data.frame(
  lapply(
    split.default(df, (1:ncol(df) - 1) %/% 4), 
    function(x) do.call(paste, c(x, list(sep="/"))
) ) )

Splitting the data frame by columns using (1:ncol(df) - 1) %/% 4) creates groups of four columns (or fewer if you have a non-mulitple of four for the last group), which then makes it trivial to pass on to paste. Note we have to use split.default because split.data.frame will attempt to split by row instead of column. Produces:

               X0              X1
1 257/262/228/266 204/245/282/132
2 244/115/240/187 196/133/189/251
3 298/139/216/225 219/276/192/254
4 129/176/180/182 215/250/227/186
5 238/217/284/240 131/184/247/168

May be it is faster.

 df = data.frame(c1 =letters,c2=LETTERS, c3=letters, c4=LETTERS)
 do.call('paste',c(df[,1:2],list(sep='/')));
 [1] "A/a" "B/b" "C/c" "D/d" "E/e" "F/f" "G/g" "H/h" "I/i" "J/j" "K/k" "L/l"
 [13] "M/m" "N/n" "O/o" "P/p" "Q/q" "R/r" "S/s" "T/t" "U/u" "V/v" "W/w" "X/x"
 [25] "Y/y" "Z/z"
 do.call('paste',c(df[,3:4],list(sep='/')));
 [1] "A/a" "B/b" "C/c" "D/d" "E/e" "F/f" "G/g" "H/h" "I/i" "J/j" "K/k" "L/l"
 [13] "M/m" "N/n" "O/o" "P/p" "Q/q" "R/r" "S/s" "T/t" "U/u" "V/v" "W/w" "X/x"
 [25] "Y/y" "Z/z"

This is (hopefully) a more generalisable solution that doesn't rely on any positional arguments:

newnames <- gsub("\\.\\d+","",names(df))
#[1] "loc1" "loc1" "loc1" "loc1" "loc2" "loc2" "loc2" "loc2"
do.call(cbind,
        lapply(unique(newnames), function(x) 
          do.call(paste,c(df[newnames %in% x],sep="/") )
        )
)

#     [,1]              [,2]             
#[1,] "257/262/228/266" "204/245/282/132"
#[2,] "244/115/240/187" "196/133/189/251"
#[3,] "298/139/216/225" "219/276/192/254"
#[4,] "129/176/180/182" "215/250/227/186"
#[5,] "238/217/284/240" "131/184/247/168"
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top