What's the best trick to speed up a monte carlo simulation? [closed]
-
05-07-2019 - |
Question
Whenever I run large scale monte carlo simulations in S-Plus, I always end up growing a beard while I wait for it to complete.
What are the best tricks for running monte carlo simulations in R? Any good examples of running processes in a distributed fashion?
Solution
Using multiple cores/machines should be simple if you're just using parallel independent replications, but be aware of common deficiencies of random number generators (e.g. if using the current time as seed, spawning many processes with one RNG for each might produce correlated random numbers, which leads to invalid results - see e.g. this paper)
You might want to use variance reduction to reduce the number of required replications, i.e. to shrink the size of the required sample. More advanced variance reduction techniques can be found in many textbooks, e.g. in this one.
OTHER TIPS
Preallocate your vectors!
> nsims <- 10000
> n <- 100
>
> system.time({
res <- NULL
for (i in 1:nsims) {
res <- c(res,mean(rnorm(n)))
}
})
user system elapsed
0.761 0.015 0.783
>
> system.time({
res <- rep(NA, nsims)
for (i in 1:nsims) {
res[i] <- mean(rnorm(n))
}
})
user system elapsed
0.485 0.001 0.488
>
Latin Hypercube Sampling is easily applied and has a major influence on the results. Basically you take a latin hypercube sample from a uniform distribution (e.g., using randomLHS() in the package lhs) and transform this to your desired distribution using e.g., qnorm(uniformsample).
I know this thread is really old, but if anyone stumbles upon it and is looking for an even faster method, I think the following works:
library(data.table)
library(microbenchmark)
nsims <- 10000
n <- 100
# Answer from @Eduardo_Leoni:
preallocate<-function(nsims, n) {
res <- rep(NA, nsims)
for (i in 1:nsims) {
res[i] <- mean(rnorm(n))
}
return(res)
}
# Answer using data.table:
datatable<-function(nsims,n) {
dt <- data.table(i=1:nsims)[,list(res=mean(rnorm(1:n))),by=i]
return(dt)
}
# Timing benchmark:
microbenchmark(preallocate(nsims,n), datatable(nsims,n), times=100)
#Unit: milliseconds
# expr min lq median uq max neval
# preallocate(nsims, n) 428.4022 432.3249 434.2910 436.4806 489.2061 100
# datatable(nsims, n) 238.9006 242.3517 244.1229 246.5998 303.6133 100