Apply over two data frames

https://stackoverflow.com/questions/3557652

r
apply

01-10-2019
|

Question

I'm using R, and I have two data.frames, A and B. They both have 6 rows, but A has 25000 columns (genes), and B has 30 columns. I'd like to apply a function with two arguments f(x,y) where x is every column of A and y is every column of B. So far it looks like this:

i = 1
for (x in A){
    j = 1
    for (y in B){
        out[i,j] <- f(x,y)
        j = j + 1
    }
    i = i + 1
}

I have two issues with this: from my Python programming I associate keeping track of counters like this as crufty, and from my R programming I am nervous of for loops. However, I can't quite see how to apply apply (or even if I should apply apply) to this problem and was hoping someone might enlighten me. I need to treat f() as atomic (it's actually cor.test()) for now.

Solution

Since you are using data frames, it might be faster to use lapply or sapply to do this (specially given the scope of your data frames). For example,

x <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
bl <- lapply(x, function(u){
   lapply(y, function(v){
       f(u,v) # Function with column from x and column from y as inputs
   })
})
out = matrix(unlist(bl), ncol=ncol(y), byrow=T)

OTHER TIPS

Some data

nrows <- 6
A <- data.frame(a = runif(nrows), b = runif(nrows), c = runif(nrows))
B <- data.frame(z = rnorm(nrows), y = rnorm(nrows))

The trick: remember columns with expand.grid

counter <- expand.grid(seq_along(A), seq_along(B))
f <- function(x) 
{
  cor.test(A[, x["Var1"]], B[, x["Var2"]])$estimate
}

Now we only need 1 call to apply.

stats <- apply(counter, 1, f)
names(stats) <- paste(names(A)[counter$Var1], names(B)[counter$Var2], sep = ",")
stats

Nesting the applies works, not the easiest syntax, though.

x<-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y<-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))

z<-apply(x,2,function(col,df2)
             {
               apply(df2,2,function(col2,col1)
                           {
                              col2+col1
                           },col)
             },y)

z
 col1 col2 col3
[1,]    2    6   10
[2,]    4    8   12
[3,]    6   10   14
[4,]    8   12   16
[5,]    6   10   14
[6,]    8   12   16
[7,]   10   14   18
[8,]   12   16   20

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow