Question

I'm trying to run this job on our cluster, and I keep getting this "object of type 'closure' is not subsettable" error. It basically runs this function "do_1()" on a bunch of nodes. The closure object I'm subsetting is called "data" so I take this to mean that the RData files are not reading in on each node (it's probably not best practices to call each of these individual datasets "data" so that's my bad).

I stripped the script down to something as bare bones as possible, and it's displayed below. It still produces the same error when I submit the job. I figure there's something I don't know about reading in separate data sets on each node...some argument I didn't specify in the call to load() maybe. Maybe the "data" dataset isn't in the right namespace or something...I'm not sure. Any ideas will be kindly appreciated.

library(parallel)
library(Rmpi)

np <- mpi.universe.size()
cl <- makeCluster(np, type = "MPI")

allFiles <- list.files("/bigtmp/trb5me/rdata_files/")
allFiles <- sapply(allFiles, function(string) paste("/bigtmp/trb5me/rdata_files/", string, sep = ""))

run_one_day <- function(daynum){

  # do we want to subset days to not the first hour?
  train <- data[[daynum]] * 10000
  train
}
clusterExport(cl = cl, "run_one_day")

do_1 <- function(path_to_file){

  if(!require(xts)){
    install.packages("xts")
    library(xts)
  }

  # load data
  load(file=path_to_file)

  # extract the symbol name so we cna save the results later
  symbolName <- strsplit(path_to_file, "/")[[1]][5]
  symbolName <- strsplit(symbolName, ".", fixed = T)[[1]][1]

  # get the results
  # there is also a function called data...so in this case it's length will be 1
  mySequence <- 1:(length(data)-1)
  myResults <- lapply(mySequence, run_one_day)   #this is where the problem is! 

  # save the results
  path_dest <- paste("/bigtmp/trb5me/mod1_results/", symbolName, ".RData", sep = "")
  save(myResults, file = path_dest)

  # remove everything from memory
  rm(list=ls())

}

parLapply(cl, allFiles, do_1)

# turn off all the cluster stuff
stopCluster(cl)
mpi.exit()
Was it helpful?

Solution 2

The data variable is only available on the master, not on the slaves. Since there also happens to be a function called data, that is what they try to use, and subsetting it with [[ gives the error message you get.

Try exporting the data variable to the other nodes before the computation.

clusterExport(cl, "data")

OTHER TIPS

It's a scoping problem: "data" is loaded in the local environment of "do_1" which isn't in the scope of the "run_one_day" function. R uses lexical scoping, so what matters is where "run_one_day" is defined, not where it is called.

One solution is to use the load "envir" argument to load "data" into the global environment:

load(file=path_to_file, envir=.GlobalEnv)

Another solution is to define "run_one_day" inside the "do_1" function.

Possibly wrong, but it looks like the error is actually in train <- data[[daynum]]. It's trying to subset the function, or "closure", data and, of course, bugging out. Try naming your dataset something other than 'data' and see what happens.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top