Pergunta

I want to use a foreach loop on a Windows machine to make use of multiple cores in cpu heavy computation. However, I cannot get the processes to do any work.

Here is a minimal example of what I think should work, but doesn't:

library(snow)
library(doSNOW)
library(foreach)

cl <- makeSOCKcluster(4)
registerDoSNOW(cl)

pois <- rpois(1e6, 1500) # draw 1500 times from poisson with mean 1500

x <- foreach(i=1:1e6) %dopar% {
  runif(pois[i]) # draw from uniform distribution pois[i] times
}

stopCluster(cl)

SNOW does create the 4 "slave" processes, but they don't do any work: Task Manager Screenshot

I hope this isn't a duplicate, but I cannot find anything with the search terms I can come up with.

Foi útil?

Solução

It's probably working (at least it does on my mac). However, one call to runif takes such a small amount of time that all the time is spent for the overhead and the child processes spend negligible CPU power with the actual tasks.

x <- foreach(i=1:20) %dopar% {
  system.time(runif(pois[i])) 
}
x[[1]]
#user  system elapsed 
#   0       0       0 

Parallelization makes sense if you have some heavy computations that cannot be optimized. That's not the case in your example. You don't need 1e6 calls to runif, one would be sufficient (e.g., runif(sum(pois)) and then split the result).

PS: Always test with a smaller example.

Outras dicas

Although this particular example isn't worth executing in parallel, it's worth noting that since it uses doSNOW, the entire pois vector is auto-exported to all of the workers even though each worker only needs a fraction of it. However, you can avoid auto-exporting any data to the workers by iterating over pois itself:

x <- foreach(p=pois) %dopar% {
  runif(p)
}

Now the elements of pois are sent to the workers in the tasks, so each worker only receives the data that's actually needed to perform its tasks. This technique isn't important when using doMC, since the doMC workers get pois for free.

You can also often improve performance enormously by processing pois in larger chunks using an iterator function such as "isplitVector" from the itertools package.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top