Question

It seems clusterMap in Snow doesn't support dynamic processing. I'd like to do parallel computing with two pairs of parameters stored in a data frame. But the elapsed time of every job vary very much. If the jobs are run un-dynamically, it will be time consuming.

e.g.

library(snow)
cl2 <- makeCluster(3, type = "SOCK") 
df_t <- data.frame (type=c(rep('a',3),rep('b',3)), value=c(rep('1',3),rep('2',3)))
clusterExport(cl2,"df_t")
clusterMap(cl2, function(x,y){paste(x,y)},
           df_t$type,df_t$value)
Was it helpful?

Solution

It is true that clusterMap doesn't support dynamic processing, but there is a comment in the code suggesting that it might be implemented in the future.

In the meantime, I would create a list from the data in order to call clusterApplyLB with a slightly different worker function:

ldf <- lapply(seq_len(nrow(df_t)), function(i) df_t[i,])
clusterApplyLB(cl2, ldf, function(df) {paste(df$type, df$value)})

This was common before clusterMap was added to the snow package.

Note that your use of clusterMap doesn't actually require you to export df_t since your worker function doesn't refer to it. But if you're willing to export df_t to the workers, you could also use:

clusterApplyLB(cl2, 1:nrow(df_t), function(i){paste(df_t$type[i],df_t$value[i])})

In this case, df_t must be exported to the cluster workers since the worker function references it. However, it is generally less efficient since each worker only needs a fraction of the entire data frame.

OTHER TIPS

I found clusterMap in Parallel package support LB. But it less efficient than the method of clusterApplyLB combined with lapply implemented by Snow. I tried to find out the source code to figure out. But the clusterMap is not available when I click the link 'source' and 'R code'.

Parallel Doc

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top