Question

I'm trying to speed up the prediction of a test-dataset (n=35000) by splitting it up and letting R run on smaller chunks. The model has been generated with party::cforest.

However, I can't get R to calculate even the smallest parts when trying to use foreach with %dopar%.

My prediction function takes about 7 seconds for both predict(fit,newdata=a[1:100,]) and foreach(i=1:10) %do% {predict(fit,newdata=a[1:10,])}.

But when I try and use %dopar%instead, R seems to freeze. Shouldn't :

foreach(i=1:10, .packages=c('party')) %dopar% {predict(fit,newdata=a[1:10,])}

be way faster? Or is the parallelization itself slowing R down somehow?

Test-running with another function (repeatedly calculating sqrt(3) as suggested here ) has shown significant improvement, so the %dopar% is working too.

Predictions with a randomForest behave similarly, with the difference that here even %do% for 10x1:10 predictions takes a lot more time than just predicting 1:100 For randomForest I don't really care though, because predicting all 35k datasets is not a problem anyway. Btw. it only me, or is cforest taking more time and RAM for everything? Only having trouble where randomForest works like a charm..

(running on Windows 7, x64, 8GB RAM, 4 cores/8 threads - using 6 nodes in doSNOW parallelization cluster)

Was it helpful?

Solution

The primary problem with your example is that foreach is automatically exporting the entire a data frame to each of the workers. Instead, try something like:

library(itertools)
foreach(1:10, suba=isplitRows(a, chunkSize=10), .packages='party') %dopar% {
    predict(fit, newdata=suba)
}

The 1:10 is for test purposes, to limit the loop to only 10 iterations, as you're doing in your example.

This still requires that fit be exported to all of the workers, and it might be quite large. But since there are many more tasks than workers and if predict takes enough time compared to the time to send the test data, it might be worthwhile to parallelize the prediction.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top