Stacking data.frames in a list into a single data.frame, maintaining names(list) as an extra column

https://stackoverflow.com/questions/22407125

14-06-2023
|

Domanda

I have a list of data frames which I would like to combine into a single data.frame. This is what my list looks like:

my_list <- list(
    m=data.frame(a = letters[1:5], b = 1:5, c = rnorm(5)), 
    n=data.frame(a = letters[1:5], b = 6:10, c = rnorm(5)))

> my_list
$m
  a b          c 
1 a 1  0.1151720  
2 b 2 -0.3785748  
3 c 3 -0.1446305  
4 d 4 -0.4300272  
5 e 5  1.1982312  

$n
  a  b          c 
1 a  6  1.2079439 
2 b  7 -1.2414251 
3 c  8  0.4362390 
4 d  9 -0.5844525 
5 e 10  0.1420070

I'd like to stack these on top of each other, but without losing the context of the name of the data.frame ("m", "n"). Ideally, the name of the original data frame would be included as an extra column in the final data frame. One way would be to just add the extra column before using rbind.fill:

for(i in 1:length(my_list)) my_list[[i]][, 4] <- names(my_list)[i]
library(plyr)
rbind.fill(my_list)

   a  b          c V4
1  a  1  0.1151720  m
2  b  2 -0.3785748  m
3  c  3 -0.1446305  m
4  d  4 -0.4300272  m
5  e  5  1.1982312  m
6  a  6  1.2079439  n
7  b  7 -1.2414251  n
8  c  8  0.4362390  n
9  d  9 -0.5844525  n
10 e 10  0.1420070  n

What I don't like about that is I have to take care about the dimensions of the data frame and the name of the extra column.

Isn't there a function out there that does that better, in a more flexible and generic way?

Soluzione 2

You can solve both problems by using an alternative way of addressing a column:

for(i in 1:length(my_list)) my_list[[i]]$names <- names(my_list)[i]

Or, avoiding a loop (more idiomatic R, IMHO):

lapply(names(my_list), function (n) cbind(my_list[[n]], names = n))

Incidentally, plyr isn’t needed here, the same effect can be achieved via

do.call(rbind, my_list)

Altri suggerimenti

Another possibility:

library(plyr)
ldply(my_list)
#    .id a  b          c
# 1    m a  1 -0.1294107
# 2    m b  2  0.8867361
# 3    m c  3 -0.1513960
# 4    m d  4  0.3297912
# 5    m e  5 -3.2273228
# 6    n a  6 -0.7717918
# 7    n b  7  0.2865486
# 8    n c  8 -1.2205120
# 9    n d  9  0.4345504
# 10   n e 10  0.8001769

This too annoyed me as it's easy enough to do but I just want a handy way to do it. You can accomplish this as well via the convenience wrapper qdap::list_df2df:

library(qdap)
list_df2df(my_list, "V4")

##    V4 a  b           c
## 1   m a  1 -0.37622031
## 2   m b  2  0.43700001
## 3   m c  3  0.65035652
## 4   m d  4 -0.09290962
## 5   m e  5  0.16675182
## 6   n a  6 -2.43296495
## 7   n b  7  1.91099315
## 8   n c  8  0.03232916
## 9   n d  9 -1.18901280
## 10  n e 10  0.42399969

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow