문제

I'm having a problem getting some code to work with the parallel package in R. I'm using R 2.15.

Here's a simplified example... I have a file 'animal.R' which contains the following:

# animal.R
setClass("Animal", representation(species = "character", legs = "numeric"))

##Define some Animal methods
setGeneric("count",function(x) standardGeneric("count"))
setMethod("count", "Animal", function(x) { x@legs})

setGeneric("countAfterChopping",function(x) standardGeneric("countAfterChopping"))
setMethod("countAfterChopping", "Animal", function(x) { x@legs <- x@legs-1; x@legs})

Then, in my R terminal, I run:

library(parallel)
source('animal.R')

Start a local cluster of two nodes:

cl <- makeCluster(rep('localhost', 2))

Tell the cluster nodes about the Animal class:

clusterEvalQ(cl, parse('animal.R'))

Then run some code on the cluster:

# This works
parSapply(cl, list(daisy, fred), count)

# This doesn't...
parSapply(cl, list(daisy, fred), countAfterChopping)

Stop the cluster:

stopCluster(cl)

The first call to parSapply works as expected, but the second produces this error:

Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: "Animal" is not a defined class

Any ideas what's going on? Why doesn't the second call to parSapply work?

도움이 되었습니까?

해결책

So here's what's going on:

For S4 objects of class "Animal", the count function simply extracts the legs slot. If this were all that you were doing, you wouldn't need to evaluate or source the file animal.R on your cluster nodes. All necessary information would be passed by parSapply.

However, the countAfterChopping function assigns a new value to the legs slot, and this is where the fun begins. The slot assignment function `@<-` contains a call to `slot<-` with the argument check = TRUE. This triggers an evaluation of the function checkSlotAssignment, which checks "that the value provided is allowed for this slot, by consulting the definition of the class" (from ?checkSlotAssignment).

Therefore, the class definition must be known when assigning to a slot in this way, and the S4 class "Animal" is not known on the cluster nodes. This is why evaluating the parsed file animal.R or sourcing it works. However, you would be fine with just evaluating the first line of the file, i.e., defining the class "Animal", on each node.

Here's a reduced, reproducible example:

animal.R<-"
  setClass('Animal', representation(species = 'character', legs = 'numeric'))

  ##Define some Animal methods
  setGeneric('count',function(x) standardGeneric('count'))
  setMethod('count', signature(x='Animal'), function(x) { x@legs})

  setGeneric('countAfterChopping',function(x) standardGeneric('countAfterChopping'))
  setMethod('countAfterChopping', signature(x='Animal'),
    function(x) { x@legs <- x@legs-1; x@legs})
"
library(parallel)

source(textConnection(animal.R))

cl <- makeCluster(rep('localhost', 2))

daisy<-new("Animal",legs=2,species="H.sapiens")
fred<-new("Animal",legs=4,species="C.lupus")

parSapply(cl, list(daisy, fred), count)
# [1] 2 4

clusterExport(cl,"animal.R") # 
clusterEvalQ(cl,eval(parse(textConnection(animal.R),n=1)))

parSapply(cl, list(daisy, fred), countAfterChopping)
# [1] 1 3
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top