Question

I'm testing the doRedis package by running a worker one machine and the master/server on another. The code on my master looks like this:

 #Register ...
 r <- foreach(a=1:numreps, .export(...)) %dopar% {
        train <- func1(..)

        best <- func2(...)

        weights <- func3(...)

        return ...
      }

In every function, a global variable is accessed, but not modified. I've exported the global variable in the .export portion of the foreach loop, but whenever I run the code, an error occurs stating that the variable was not found. Interestingly, the code works when all my workers on one machine, but crashes when I have an "outside" worker. Any ideas why this error is occurring, and how to correct it?

Thanks!

UPDATE: I have a gist of some code here: https://gist.github.com/liangricha/fbf29094474b67333c3b

UPDATE2: I asked a another to doRedis related question: "Would it be possible allow each worker machine to utilize all of its cores?

@Steve Weston responded: "Starting one redis worker per core will often fully utilize a machine."

Was it helpful?

Solution

This kind of code was a problem for the doParallel, doSNOW, and doMPI packages in the past, but they were improved in the last year or so to handle it better. The problem is that variables are exported to a special "export" environment, not to the global environment. That is preferable in various ways, but it means that the backend has to do more work so that the exported variables are in the scope of the exported functions. It looks like doRedis hasn't been updated to use these improvements.

Here is a simple example that illustrates the problem:

library(doRedis)
registerDoRedis('jobs')
startLocalWorkers(3, 'jobs')
glob <- 6
f1 <- function() {
  glob
}
f2 <- function() {
  foreach(1:3, .export=c('f1', 'glob')) %dopar% {
    f1()
  }
}

f2()  # fails with the error: "object 'glob' not found"

If the doParallel backend is used, it succeeds:

library(doParallel)
cl <- makePSOCKcluster(3)
registerDoParallel(cl)

f2()  # works with doParallel

One workaround is to define the function "f1" inside function "f2":

f2 <- function() {
  f1 <- function() {
    glob
  }
  foreach(1:3, .export=c('glob')) %dopar% {
    f1()
  }
}

f2()  # works with doParallel and doRedis

Another solution is to use some mechanism to export the variables to the global environment of each of the workers. With doParallel or doSNOW, you could do that with the clusterExport function, but I'm not sure how to do that with doRedis.

I'll report this issue to the author of the doRedis package and suggest that he update doRedis to handle exported functions like doParallel.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top