Pergunta

I have built a function in R (running on Ubuntu 12.04 LTS 64bit, 4 core i7 server with multithreading and 6gb ram) where I've installed R using the standard packages:

sudo apt-get install r-base r-recommended r-base-dev
sudo apt-get install r-cran-multicore r-cran-iterators r-cran-foreach r-cran-domc 

NB: I also installed foreach & doMC inside R (which didn't help either), like I installed the deldir package:

install.packages(c("deldir"), dependencies = TRUE)

My function runs fine, but it does not use parallel cores (just maxes out 1 of the 8):

library(deldir)
library(foreach)
library(doMC)
registerDoMC(cores=8)

#getDoParWorkers()
#getDoParName()
#getDoParVersion()

# loop through files
inputfiles <- dir(path="/home/geoadmin/data/objects/", pattern='.txt')
for( inputfilenr in 1:length(inputfiles))
{
# set file variables    
curinputfile = paste("/home/geoadmin/data/objects/",inputfiles[[inputfilenr]], sep = "", collapse = NULL)
print (curinputfile)
curoutputfile = paste("/home/geoadmin/data/objects/",substr(inputfiles[[inputfilenr]], start=1, stop=10), '.out', sep = "", collapse = NULL)
# select the point x/y coordinates into a data frame...
points <- read.csv(curinputfile, header = TRUE, sep = ",", dec=".", fill = TRUE)
# set calculation variables, precision on 3 digits only because of the RDW coordinate system
voro = deldir(points$x, points$y, digits=3, list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
tiles = tile.list(voro)
poly = array()
# start loop
  poly <- foreach (i=1:length(tiles), .combine=cbind) %dopar% 
    {
    # load tile info
    tile = tiles[[i]]
    # start with EWKB notation
    curpoly = "POLYGON(("
    # add list of coordinates by looping through the points in tile
    for (j in 1:length(tiles[[i]]$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) }
    # then again the first point to close the polygon and end the EWKB notation, adding that to the poly array
    sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
    }
write.csv(t(poly), file = curoutputfile, row.names = FALSE) 
}

So the results are good, but no parallelism...

doMC did register correctly:

> getDoParWorkers()
[1] 8
> getDoParName()
[1] "doMC"
> getDoParVersion()
[1] "1.2.5"

If I look at the usage (with top):

top - 01:03:19 up 9 min,  3 users,  load average: 1.02, 0.86, 0.45
Tasks: 131 total,   2 running, 127 sleeping,   0 stopped,   2 zombie
Cpu(s): 12.5%us,  0.0%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6104932k total,  1240512k used,  4864420k free,    16656k buffers
Swap:  6283260k total,        0k used,  6283260k free,   141996k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1553 zzzzzzzz  20   0  913m 850m 3716 R  100 14.3   8:22.03 R

So just maxing out one core. Does anyone have any idea what could cause foreach/doMC to not use multiple cores?

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] doMC_1.2.5      multicore_0.1-7 iterators_1.0.6 foreach_1.4.0
[5] deldir_0.0-19

loaded via a namespace (and not attached):
[1] codetools_0.2-8
Foi útil?

Solução

To add the likely answer for the question: As foreach/mc does work on the computer itself (with the standard example), it's the specific code itself and likely that the voro=deldir part takes up the time, not the loop after it. This however means that the deldir package needs to be adjusted. Looking at the code in the DelDir source it seems I would need to adjust this snippet in the code:

# Call the master subroutine to do the work:
repeat {
    tmp <- .Fortran(
            'master',
            x=as.double(x),
            y=as.double(y),
            sort=as.logical(sort),
            rw=as.double(rw),
            npd=as.integer(npd),
            ntot=as.integer(ntot),
            nadj=integer(tadj),
            madj=as.integer(madj),
            ind=integer(npd),
            tx=double(npd),
            ty=double(npd),
            ilist=integer(npd),
            eps=as.double(eps),
            delsgs=double(tdel),
            ndel=as.integer(ndel),
            delsum=double(ntdel),
            dirsgs=double(tdir),
            ndir=as.integer(ndir),
            dirsum=double(ntdir),
            nerror=integer(1),
            PACKAGE='deldir'
        )

Not sure yet how i can format this into a thing which would work with foreach though...

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top