Question

I have just finished running a long running analysis (24+ hours) on multiple sets of data. Since I'm lazy and didnt want to deal with multiple R sessions and pulling the results together afterwards, I ran them in parallel using foreach.

The analysis returns an environment full of the results (and intermediate objects), so I attempted to assign the results to global environments, only to find that this didn't work. Here's some code to illustrate:

library(doMC)
library(foreach)
registerDoMC(3)

bigAnalysis <- function(matr) {
  results <- new.env()
  results$num1 <- 1
  results$m <- matrix(1:9, 3, 3)
  results$l <- list(1, list(3,4))

  return(results)
}

a <- new.env()
b <- new.env()
c <- new.env()

foreach(i = 1:3) %dopar% {
  if (i == 1) {
    a <<- bigAnalysis(data1)
    plot(a$m[,1], a$m[,2]) # assignment has worked here
  } else if (i == 2) {
    b <<- bigAnalysis(data2)
  } else {
    c <<- bigAnalysis(data3)
  }
}

# Nothing stored :(
ls(envir=a)
# character(0)

I've used global assignment within foreach before (within a function) to populate matrices I'd set up beforehand with data (where I couldn't do it nicely with .combine), so I thought this would work.

EDIT: It appears that this only works within the body of a function:

f <- function() {
  foreach(i = 1:3) %dopar% {
    if (i == 1) {
      a <<- bigAnalysis(data1)
    } else if (i == 2) {
      b <<- bigAnalysis(data2)
    } else {
      c <<- bigAnalysis(data3)
    }
  }
  d <- new.env()
  d$a <- a
  d$b <- b
  d$c <- c
  return(d)
}

Why does this work in a function, but not in the top-level environment?

Was it helpful?

Solution

Your attempts to assign to global variables in the foreach loop are failing because they are happening on the worker processes that were forked by mclapply. Those variables aren't sent back to the master process, so they are lost.

You could try something like this:

r <- foreach(i = 1:3) %dopar% {
  if (i == 1) {
    bigAnalysis(data1)
  } else if (i == 2) {
    bigAnalysis(data2)
  } else {
    bigAnalysis(data3)
  }
}

a <- r[[1]]
b <- r[[2]]
c <- r[[3]]
ls(a)

This uses the default combine function which returns the three environment objects in a list.

Executing the foreach loop in a function isn't going to make it work. However, the assignments would work if you didn't call registerDoMC so that you were actually running sequentially. In that case you really are making assignments to the master process's global environment.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top