lapply with growing data.table function in R

Question 1

Your names end with 'df', which makes it seem like your data are a data.frame. But @Troy's answer uses a matrix. A matrix is appropriate when the data are homogeneous, and generally matrix operations are much faster than data.frame operations. So you can see already that if you'd provided a small example of your data set (e.g., dput(mRNAdf[1:10,]) that people might be in a better position to help you; this is what they're asking for.

In large numerical calculations it makes sense to 'hoist' any repeated calculations outside the loop, so they are performed only once. Repeated calculations in your case include sub-setting to columns 2:13, and coercion to numeric. With this idea, and guessing that you actually have a data.frame where each column is already a numeric vector, I'd start with

mRNAmatrix <- as.matrix(mRNAdf[,2:13])
miRmatrix <- as.matrix(miRdf[,2:13])

From the help page ?cor we see that the arguments can be a matrix, and if so the correlation is calculated between columns. You're interested in the result when the arguments are transposed relative to your current representation. So

result <- cor(t(mRNAmatrix), t(miRmatrix), use="complete.obs")

This is fast enough for your purposes

> m1 = matrix(rnorm(71521 * 12), 71521)
> m2 = matrix(rnorm(1477 * 12), 1477)
> system.time(ans <- cor(t(m1), t(m2)))
   user  system elapsed 
  9.124   0.200   9.340 
> dim(ans)
[1] 71521  1477

result is the same as your corrList -- it's not a list, but a matrix; probably the row and column names have been carried forward. You'd write this to a file as you do above, write.csv(result, "testCorrWhole.csv")

Question 2

UPDATED BELOW TO SHOW PARALLEL PROCESSING - ABOUT A 60% SAVING

Using apply() might not be quick enough for you. Here's how to do it, though. Will have a think about performance since this example (1M output correlations in 1000x1000 grid) takes over a minute on laptop.

miRdf=matrix(rnorm(13000,10,1),ncol=13)
mRNAdf=matrix(rnorm(13000,10,1),ncol=13)
miRdf[,1]<-1:nrow(miRdf)     # using column 1 as indices since they're not in the calc.
mRNAdf[,1]<-1:nrow(mRNAdf)

corRow<-function(y){
    apply(miRdf,1,function(x)cor(as.numeric(x[2:13]), as.numeric(mRNAdf[y,2:13]), use ="complete.obs"))
  }

system.time(apply(mRNAdf,1,function(x)corRow(x[1])))
# user  system elapsed 
# 72.94    0.00   73.39

And with parallel::parApply on a 4 core Win64 laptop

require(parallel) ## Library to allow parallel processing

miRdf=matrix(rnorm(13000,10,1),ncol=13)
mRNAdf=matrix(rnorm(13000,10,1),ncol=13)
miRdf[,1]<-1:nrow(miRdf)     # using column 1 as indices since they're not in the calc.
mRNAdf[,1]<-1:nrow(mRNAdf)

corRow<-function(y){
    apply(miRdf,1,function(x)cor(as.numeric(x[2:13]), as.numeric(mRNAdf[y,2:13]), use ="complete.obs"))
  }


      # Make a cluster from all available cores
      cl=makeCluster(detectCores()) 
      # Use clusterExport() to distribute the function and data.frames needed in the apply() call
      clusterExport(cl,c("corRow","miRdf","mRNAdf"))
      # time the call
      system.time(parApply(cl,mRNAdf,1,function(x)corRow(x[[1]])))

      # Stop the cluster
      stopCluster(cl)

      # time the call without clustering
      system.time(apply(mRNAdf,1,function(x)corRow(x[[1]])))

      ## WITH CLUSTER (4)
      user  system elapsed 
      0.04    0.03   29.94 

      ## WITHOUT CLUSTER
      user  system elapsed 
      73.96    0.00   74.46