Question

I'm calling a Windows executable from multiple parallel R processes (via a system call within parSapply). This .exe (let's call it my.exe) is passed a filename as an argument, and processes this file (details are probably irrelevant). Unfortunately, my.exe creates a log file (in the same directory as my.exe) that it writes to while it runs, and, since the log file's name is fixed, subsequent R processes calling my.exe results in my.exe` throwing the error:

Cannot create result file "log.res". 
Do you have write access in the current directory?

I've managed to work around this by creating multiple copies of the my.exe (as many as the number of cores in my cluster, i.e. 7). I can then ensure that each is only in use by a single R process at any one time, by passing to the cores a vector of 7 paths to .bat files, each of which repeatedly calls a given copy of my.exe.

Is there a more elegant way to deal with this issue, perhaps by having the processes create virtual instances of my.exe automagically? I don't require the log files.

Since this is an error thrown by the program and not by R, I suspect there might be no way to permit concurrent write access to the log file from the R side of things.

Ideally, I want to be doing something like this:

ff <- c('a', 'long', 'vector', 'of', 'file', 'paths') # abbreviated
parSapply(cl, ff, function(f) system(sprintf("my.exe %s", f)))

but instead I've resorted to doing (more or less) this (after copying my.exe to c:/1/, c:/2/, c:/3/, through c:/7/):

cat(paste('CALL C:/1/my.exe',  ff[1:10], '/RUN=YES'), file='run1.bat', sep='\n')
cat(paste('CALL C:/2/my.exe', ff[11:20], '/RUN=YES'), file='run2.bat', sep='\n')
cat(paste('CALL C:/3/my.exe', ff[21:30], '/RUN=YES'), file='run3.bat', sep='\n')
cat(paste('CALL C:/4/my.exe', ff[31:40], '/RUN=YES'), file='run4.bat', sep='\n')
cat(paste('CALL C:/5/my.exe', ff[41:50], '/RUN=YES'), file='run5.bat', sep='\n')
cat(paste('CALL C:/6/my.exe', ff[51:60], '/RUN=YES'), file='run6.bat', sep='\n')
cat(paste('CALL C:/7/my.exe', ff[61:70], '/RUN=YES'), file='run7.bat', sep='\n')
parSapply(cl, c('run1.bat', 'run2.bat', 'run3.bat', 'run4.bat',
                'run5.bat', 'run6.bat', 'run7.bat'), system)

(Above, instead of letting parSapply assign the 70 elements of ff to the various processes, I manually split them when creating the batch files, and then run the batch files in parallel.)

Was it helpful?

Solution

It sounds like your basic strategy is the only known solution to the problem, but I think it can be done more elegantly. For instance, you could avoid creating .BAT files by having each worker execute a different command line based on a worker ID. The worker ID could be assigned using:

# Assign worker ID's to each of the cluster workers
setid <- function(id) assign(".Worker.id", id, pos=globalenv())
clusterApply(cl, seq_along(cl), setid)

Also, you may want to automate the creation of the directories that contain "my.exe". I also prefer to use a symlink rather than a copy of the executable:

# Create directories containing a symlink to the real executable
exepath <- "C:/bin/my.exe"  # Path to the real executable
pdir <- getwd()  # Parent of the new executable directories
myexe <- file.path(pdir, sprintf("work_%d", seq_along(cl)), "my.exe")
for (x in myexe) {
  dir.create(dirname(x), showWarnings=FALSE)
  if (file.exists(x)) 
    unlink(x)
  file.symlink(exepath, x)
}

If symlinks don't fool "my.exe" into creating the log file in the desired directory, you could try using "file.copy" instead of "file.symlink".

Now you can run your parallel job using:

# Each worker executes a different symlink to the real executable
worker.fun <- function(f, myexe) {
  system(sprintf("%s %s /RUN=YES", myexe[.Worker.id], f))
}
ff <- c('a', 'long', 'vector', 'of', 'file', 'paths')
parSapply(cl, ff, worker.fun, myexe)

You could also delete the directories that were created, but they don't use much space since symlinks are used, so it might be better to keep them, especially during debugging/testing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top