Question
I'm using R, and I have two data.frames, A
and B
. They both have 6 rows, but A
has 25000 columns (genes), and B
has 30 columns. I'd like to apply a function with two arguments f(x,y)
where x
is every column of A
and y
is every column of B
. So far it looks like this:
i = 1
for (x in A){
j = 1
for (y in B){
out[i,j] <- f(x,y)
j = j + 1
}
i = i + 1
}
I have two issues with this: from my Python programming I associate keeping track of counters like this as crufty, and from my R programming I am nervous of for loops. However, I can't quite see how to apply apply
(or even if I should apply apply
) to this problem and was hoping someone might enlighten me. I need to treat f()
as atomic (it's actually cor.test()
) for now.
Solution
Since you are using data frames, it might be faster to use lapply or sapply to do this (specially given the scope of your data frames). For example,
x <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
bl <- lapply(x, function(u){
lapply(y, function(v){
f(u,v) # Function with column from x and column from y as inputs
})
})
out = matrix(unlist(bl), ncol=ncol(y), byrow=T)
OTHER TIPS
Some data
nrows <- 6
A <- data.frame(a = runif(nrows), b = runif(nrows), c = runif(nrows))
B <- data.frame(z = rnorm(nrows), y = rnorm(nrows))
The trick: remember columns with expand.grid
counter <- expand.grid(seq_along(A), seq_along(B))
f <- function(x)
{
cor.test(A[, x["Var1"]], B[, x["Var2"]])$estimate
}
Now we only need 1 call to apply
.
stats <- apply(counter, 1, f)
names(stats) <- paste(names(A)[counter$Var1], names(B)[counter$Var2], sep = ",")
stats
Nesting the applies works, not the easiest syntax, though.
x<-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y<-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
z<-apply(x,2,function(col,df2)
{
apply(df2,2,function(col2,col1)
{
col2+col1
},col)
},y)
z
col1 col2 col3
[1,] 2 6 10
[2,] 4 8 12
[3,] 6 10 14
[4,] 8 12 16
[5,] 6 10 14
[6,] 8 12 16
[7,] 10 14 18
[8,] 12 16 20