Question

I use parSapply() from parallel package in R. I need to perform calculations on huge amount of data. Even in parallel it takes hours to execute, so I decided to regularly write results to a file from clusters using write.table(), because the process crashes from time to time when running out of memory or for some other random reason and I want to continue calculations from the place it stopped. I noticed that some lines of csv files that I get are just cut in the middle, probably as a result of several processes writing to the file at the same time. Is there a way to place a lock on the file for the time while write.table() executes, so other clusters can't access it or the only way out is to write to separate file from each cluster and then merge the results?

No correct solution

OTHER TIPS

It is now possible to create file locks using filelock (GitHub)

In order to facilitate this with parSapply() you would need to edit your loop so that if the file is locked the process will not simply quit, but either try again or Sys.sleep() for a short amount of time. However, I am not certain how this will affect your performance.

Instead I recommend you create cluster-specific files that can hold your data, eliminating the need for a lock file and not reducing your performance. Afterwards you should be able to weave these files and create your final results file. If size is an issue then you can use disk.frame to work with files that are larger than your system RAM.

The old unix technique looks like this:

`#make sure other processes are not writing to the files by trying to create a directory: if the directory exists it sends an error and then tries again. Exit the repeat when it successfully creates the lock directory

repeat{ 
        if(system2(command="mkdir", args= "lockdir",stderr=NULL)==0){break}
    }
write.table(MyTable,file=filename,append=T)

#get rid of the locking directory

    system2(command = "rmdir", args = "lockdir") 

`

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top