Question

Whenever I run large scale monte carlo simulations in S-Plus, I always end up growing a beard while I wait for it to complete.

What are the best tricks for running monte carlo simulations in R? Any good examples of running processes in a distributed fashion?

Was it helpful?

Solution

  • Using multiple cores/machines should be simple if you're just using parallel independent replications, but be aware of common deficiencies of random number generators (e.g. if using the current time as seed, spawning many processes with one RNG for each might produce correlated random numbers, which leads to invalid results - see e.g. this paper)

  • You might want to use variance reduction to reduce the number of required replications, i.e. to shrink the size of the required sample. More advanced variance reduction techniques can be found in many textbooks, e.g. in this one.

OTHER TIPS

Preallocate your vectors!

> nsims <- 10000
> n <- 100
> 
> system.time({
     res <- NULL
     for (i in 1:nsims) {
         res <- c(res,mean(rnorm(n)))
     }
 })
   user  system elapsed 
  0.761   0.015   0.783 
> 
> system.time({
     res <- rep(NA, nsims)
     for (i in 1:nsims) {
         res[i] <- mean(rnorm(n))
     }
 })
   user  system elapsed 
  0.485   0.001   0.488 
> 

Latin Hypercube Sampling is easily applied and has a major influence on the results. Basically you take a latin hypercube sample from a uniform distribution (e.g., using randomLHS() in the package lhs) and transform this to your desired distribution using e.g., qnorm(uniformsample).

I know this thread is really old, but if anyone stumbles upon it and is looking for an even faster method, I think the following works:

library(data.table)
library(microbenchmark)

nsims <- 10000
n <- 100

# Answer from @Eduardo_Leoni:
preallocate<-function(nsims, n) {
  res <- rep(NA, nsims)
  for (i in 1:nsims) {
    res[i] <- mean(rnorm(n))
  }
  return(res)
}

# Answer using data.table:
datatable<-function(nsims,n) {
  dt <- data.table(i=1:nsims)[,list(res=mean(rnorm(1:n))),by=i]
  return(dt)
}

# Timing benchmark:
microbenchmark(preallocate(nsims,n), datatable(nsims,n), times=100)
#Unit: milliseconds
#                  expr      min       lq   median       uq      max neval
# preallocate(nsims, n) 428.4022 432.3249 434.2910 436.4806 489.2061   100
#   datatable(nsims, n) 238.9006 242.3517 244.1229 246.5998 303.6133   100
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top