Parallel model scoring R

Question

I should start by saying that the predict.glmnet function doesn't seem to be compute intensive enough to be worth parallelizing. But this is an interesting example, and my answer may be helpful to you, even if this particular case isn't worth parallelizing.

The main problem is that the parRapply function is a parallel wrapper around apply, which in turn calls your function on the rows of the submatrices, which isn't what you want. You want your function to be called directly on the submatrices. Snow doesn't contain a convenience function that does that, but it's easy to write one:

rowchunkapply <- function(cl, x, fun, ...) {
    do.call('rbind', clusterApply(cl, splitRows(x, length(cl)), fun, ...))
}

Another problem in your example is that you need to load glmnet on the workers so that the correct predict function is called. You also don't need to explicitly export the pred_en function, since that is handled for you.

Here's my version of your example:

library(snow)
library(glmnet)
library(mlbench)

data(BostonHousing)
BostonHousing$chas <- as.numeric(BostonHousing$chas)
ind <- as.matrix(BostonHousing[,1:13], col.names=TRUE)
dep <- as.matrix(BostonHousing[,14], col.names=TRUE)
fit_lambda <- cv.glmnet(ind, dep)
fit_en <- glmnet(ind, dep, family="gaussian", alpha=0.5,
                 lambda=fit_lambda$lambda.min)
ind_exp <- do.call("rbind", rep(list(ind), 2002))

# make and initialize the cluster
cl <- makeSOCKcluster(4)
clusterEvalQ(cl, library(glmnet))
clusterExport(cl, "fit_en")

# execute a function on row chunks of x and rbind the results
rowchunkapply <- function(cl, x, fun, ...) {
    do.call('rbind', clusterApply(cl, splitRows(x, length(cl)), fun, ...))
}

# worker function
pred_en <- function(x) {
    predict(fit_en, x)
}
mt <- rowchunkapply(cl, ind_exp, pred_en)

You may also be interested in using the cv.glmnet parallel option, which uses the foreach package.