Function of a list of data frames, to return also a data frame in R

Question 1

# data
l <- list(df1 = mtcars[1:5,1:5] , df2 = mtcars[1:5,1:5], df3 = mtcars[1:5,1:5])

# note you can just add dataframes eg
o1 <- (l[[1]] + l[[2]] + l[[3]])/3

# So if you have many df in list - to get the average by summing and dividing by list length
f <- function(x) Reduce("+", x)
o2 <- f(l)/length(l)

all.equal(o1,o2)

Question 2

Yet another option, which converts the list l to an array a (using an approach suggested here) and applies mean over the first two dimensions. This assumes all data frames in l have consistent structure. Here I again use @user20650's example list.

l <- list(df1=mtcars[1:5, 1:5], df2=mtcars[1:5, 1:5], df3=mtcars[1:5, 1:5])
a <- array(unlist(l), dim=c(nrow(l[[1]]), ncol(l[[1]]), length(l)), 
           dimnames=c(dimnames(l[[1]]), list(names(l))))
apply(a, 1:2, mean)

                   mpg cyl disp  hp drat
Mazda RX4         21.0   6  160 110 3.90
Mazda RX4 Wag     21.0   6  160 110 3.90
Datsun 710        22.8   4  108  93 3.85
Hornet 4 Drive    21.4   6  258 110 3.08
Hornet Sportabout 18.7   8  360 175 3.15

Question 3

Try to merge and then calculate your means:

df <- Reduce(rbind, lapply(mylist, function(df) {
  df$id <- seq_len(nrow(df))
  df
}))
df <- aggregate(. ~ id, df, mean)[, -1]

Example

mylist <- lapply(seq_len(3), function(x) iris[, 1:4] + runif(1, 0, 1))
sapply(seq_len(3), function(i) mylist[[i]][1,1])
# [1] 5.368424 6.097071 5.681132
# Apply above code
head(df)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1     5.715542    4.115542     2.015542   0.8155424
# 2     5.515542    3.615542     2.015542   0.8155424
# 3     5.315542    3.815542     1.915542   0.8155424
# 4     5.215542    3.715542     2.115542   0.8155424
# 5     5.615542    4.215542     2.015542   0.8155424
# 6     6.015542    4.515542     2.315542   1.0155424

Note that mean(c(5.368424, 6.097071, 5.681132)) = 5.715542).

Question 4

Here is an option with mapply:

as.data.frame(mapply(function(a, b) (a + b) / 2, df.lst[[1]], df.lst[[2]]))

This will work for any number of columns. mapply will cycle through each column from each data frame pairwise.

Here is the data we used:

df.lst <- replicate(2, data.frame(var1=runif(10), var2=sample(1:10)), simplify=F)

Question 5

(i think) Previous answers will fail (certainly my previous does) if some of the variables are different in each of the dataframes or if they are in a different order. A rather horrible function below but it seems to work.

l <- list(df1 = mtcars[1:5,1:5] , df2 = mtcars[1:5,1:5], df3 = mtcars[1:5,1:5])

# Allow for different variables
l2 <- list(df1 = mtcars[1:5,1:5] , df2 = mtcars[1:5,2:6], df3 = mtcars[1:5,4:7])

new.f <- function(lst) {
                l <- lst
                un.nm <- unique(unlist(lapply(l , names)))
                o <- lapply(un.nm , function(x) {
                         lapply(l , function(z) {
                               if(x %in% names(z)) z[x] else NA
                          })  
                       })
                # combine for each variable
                l <- lapply(o , function(x) do.call(cbind, x))
                mn <- lapply(l , rowMeans , na.rm=TRUE)
        names(mn) <- lapply(l ,function(i) unique(names(i)[names(i) %in% un.nm]))
               data.frame(do.call(cbind , mn))
          }


all.equal(f(l)/length(l) , new.f(l))

f(l2) # fails
# Error in Ops.data.frame(init, x[[i]]) : 
  #+ only defined for equally-sized data frames

new.f(l2)

EDIT

This example here Join matrices by both colnames and rownames in R offers a much more concise way to do this if there are different columns in each list element.

l <- lapply(l2 , function(i) as.data.frame(as.table(as.matrix(i))))
tmp <- do.call(rbind , l)
tmp <- aggregate(Freq ~ Var1 + Var2, tmp, mean)
xtabs(Freq ~ Var1 + Var2, tmp)

Question 6

Tested with @user20650's example. The mean of two equal numbers should be the same number.

 as.data.frame( setNames(
         lapply( names(mylist[[1]]), function (nm){
              rowMeans( cbind(mylist[[1]][[nm]], mylist[[2]][[nm]] ) ) }),
         names(mylist[[1]]
        ) ) )
#--------------
   mpg cyl disp  hp drat
1 21.0   6  160 110 3.90
2 21.0   6  160 110 3.90
3 22.8   4  108  93 3.85
4 21.4   6  258 110 3.08
5 18.7   8  360 175 3.15

You read R code from the inside out: For each column name we are using numeric indices to get the dataframes and character indexing to get the columns, which are then 'c-bound' together and passed to rowMeans. This list of rowMean-ed values is then given names with setNames and finally converted to a dataframe.

Note that this does not get all of the dataframes in a list of more than two... only the first two are considered.

> str(mylist)
List of 3
 $ df1:'data.frame':    5 obs. of  5 variables:
  ..$ mpg : num [1:5] 21 21 22.8 21.4 18.7
  ..$ cyl : num [1:5] 6 6 4 6 8
  ..$ disp: num [1:5] 160 160 108 258 360
  ..$ hp  : num [1:5] 110 110 93 110 175
  ..$ drat: num [1:5] 3.9 3.9 3.85 3.08 3.15
 $ df2:'data.frame':    5 obs. of  5 variables:
  ..$ mpg : num [1:5] 21 21 22.8 21.4 18.7
  ..$ cyl : num [1:5] 6 6 4 6 8
  ..$ disp: num [1:5] 160 160 108 258 360
  ..$ hp  : num [1:5] 110 110 93 110 175
  ..$ drat: num [1:5] 3.9 3.9 3.85 3.08 3.15
 $ df3:'data.frame':    5 obs. of  5 variables:
  ..$ mpg : num [1:5] 21 21 22.8 21.4 18.7
  ..$ cyl : num [1:5] 6 6 4 6 8
  ..$ disp: num [1:5] 160 160 108 258 360
  ..$ hp  : num [1:5] 110 110 93 110 175
  ..$ drat: num [1:5] 3.9 3.9 3.85 3.08 3.15