Question

In R 3.0.2 on Linux 3.12.0, I am using the system() function to execute a number of tasks. The desired effect is for each of these tasks to run as they would if I had executed them on the command-line via Rscript outside of R system().

However, when executing them inside R via system(), each task is tied to the same single CPU from the master R process.

In other words:

When launched via RScript directly from a bash shell, outside of R, each task runs on its own core as possible (this is desired)

When launched inside R via system(), each task runs on the same single core. There is no multicore sharing. If I have 100 tasks, they are all stuck on one core.

I cannot figure out how to spawn a process inside of R so that each process will use its own core.

I am using a simple test to consume CPU cycles so I can measure the effect using top/htop:

dd if=/dev/urandom bs=32k count=1000 | bzip2 -9 >> /dev/null

When this simple test is launched outside of R multiple times, each iteration gets its own core. But when I launch it inside of R:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

They are all stuck on a single core.

Here is a visualization after running 4 simultaneous/concurrent iterations of system().

enter image description here

Please help me, I need to be able to tell R to launch new tasks, with each of them running in their own core.

UPDATE DEC 4 2013:

I tried a test in Python using this:

import thread
thread.start_new_thread(os.system,("/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000",))

I repeated the new thread several times, and as expected everything worked (multiple cores used, one per thread).

So I think install the rPython package in R, and try the same from within R:

python.exec("import thread")
python.exec("thread.start_new_thread(os.system,('/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000',))")

Unfortunately, once again it was limited to a single core even after repeated calls. Why is it that everything launched is limited to a single core when executed from R?

Was it helpful?

Solution

Following on @agstudy's comment, you should get parallel to work first. On my system, this uses multiple cores:

f<-function(x)system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
library(parallel)
mclapply(1:4,f,mc.cores=4)

I would have wrote this in a comment myself, but it is too long. I know you have said that you have tried the parallel package, but I wanted to confirm that you are using it correctly. If it doesn't work, can you confirm that a non-system call uses mclapply correctly, like this one?

a<-mclapply(rep(1e8,4),rnorm,mc.cores=4)

Reading your comments, I suspect that your pthreads Linux package is out of date and broken. On my system, I am using libpthread-2.15.so (not 2.13). If you're on Ubuntu, you can grab the latest with apt-get install libpthread-stubs0.

Also, note that you should be using parallel, not multicore. If you look at the docs for parallel, you'll note that they have incorporated the work on multicore.


Reading your next set of comments, I must insist that it is parallel and not multicore that has been included in R since 2.14. You can read about this on the CRAN Task View.

Getting parallel to work is crucial. I previously told you that you could compile it directly from source, but this is not correct. I guess the only way to recompile it would be to compile R from source.

Can you also verify that your CPU affinity is set correctly? Also can you check if R can detect the number of cores? Just run:

library(parallel)
mcaffinity()
# Should be c(1,2,3,4) for you.
detectCores()
# Should be 4 for you.

OTHER TIPS

I tested running:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

on Linux 2.6.32 with R 3.0.2 and on Linux 3.8.0 with R 2.15.2. In both cases it takes up 4 CPU cores (as you would expect).

-- Edit --

I installed Linux 3.12 on a Virtual Box machine, and here R 3.0.2 also does what I expect: Takes up 4 CPUs. It even slowly wanders between the CPUs - so each process does not stick to the same CPU but changes every second or so.

This leads me to believe your system as some local modifications that forces R to use only one CPU.

From your description I would guess the local modifications are in R and not system wide (since your Python has no problems spawning more processes).

The modifications could be on your user alone, so create a new user and try with that. If it works for the new user, we need to figure out what your userid has installed.

If it does not work for the new user, it could be globally installed R libraries that causes the problem. Install an older R version and try that out. If the older version works, your R 3.0.2 installation is probably broken. Remove it and re-install it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top