Question

My Problem:

I have an R script myscript.R that uses a configuration file, e.g. config.xml, what is the best way to submit such a script to a job scheduler (e.g., using qsub)?

I would like to be able to use the script and file in the same way that I would use, e.g., a C or Fortran executable, which is embedded in a bash script.

How I currently use FORTRAN:

Here is an example of the approach that I use with a compiled Fortran executable fex like the following that I will call fscript.sh:

!#/bin/bash/
mpirun [arguments] "fex" -f $1

The above fscript.sh can be sent to a cluster with instructions to read the config file like this:

qsub [arguments] fscript.sh 1 config.xml

How I currently use R in an analogous way:

To run R in an analogous way, I am using a bash script rscript.sh

#!/bin/bash
CONFIG=$1
env $CONFIG R --vanilla < myscript.R

This can be run at the command line, e.g.

qsub [arguments] rscript.sh config.xml

Where the rscript.R contains something like

library(XML)
config <- Sys.getenv("CONFIG")
config <- xmlList(xmlParse(config.xml))
myfunction(config)

My Questions

  1. Would Rscript or compiler provide a more robust approach than my current use of bash?
  2. Under which conditions would one be more appropriate than the other (What are the pros and cons)?
  3. How would I pass a configuration file in either case?

What I have done so far

In addition to coming up with the bash script rscript.sh described above, I have read through tutorials and some documentation for Rscript and compiler, but it is not clear to me if these are the contexts in which one would be preferred over the other. Also, it is not clear the best way to pass a configuration file in either context.

This questions is related to others, e.g., What are the ways to create an executable from R program, Does an R compiler exist?. However, I do not think that is essential to use compiled code.

Was it helpful?

Solution

What does compiler have to do with anything? It compiles R code into byte-code for the R interpreter so it may not do what you suspect.

For scripting, use Rscript (available everywhere), or littler (which predates Rscript).

We actally wrote littler explicitly for this scripting purpose and my "Intro to HPC with R" talks (see the presentations page) actually have examples of submitting such script to the slurm scheduler / resource managers (as I never had access to qsub).

There are many other questions here relating to Rscript and command-line parsing. That should get you started.

OTHER TIPS

Following from Dirk's answer and another question, Parsing command line arguments in R scripts, I have come up with the following solution that will enable me to create an R executable that accepts the name of a configuration file

The myscript.sh and rscript.R from the OP can be merged into the following newrscript.R

#!/usr/bin/Rscript
config.file <- commandArgs(trailingOnly = TRUE)
config <- xmlParse(config.file)
myfunction(config)

Which can then be called from the command line, passing the name of the config file in a way that is very similar to the original use of myscript.sh:

./newrscript.R config.xml
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top