Here is a general solution to my problem. I started with roland's approach, and updated it so that the distribution, parameters, and dimensions can all be changed easily.
distSim <- function(nSims, simName, distFn, dimList, paramList, constList) {
#
# Simulate from a distribution across all the dimensions.
#
# Args:
# nSims: integer, e.g. 10000
# simName: name of the output column, e.g. 'harvestPerTree'
# distFn: distribution function, e.g. rnorm
# dimList: list of dimensions,
# e.g. list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'))
# paramList: list of parameters, each of length = product(length(d) for d in dimList),
# to be passed to the distribution function,
# e.g. list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3))
# constList: optional vector of length = product(length(d) for d in dimList)
# these are included in the output dataframe
# e.g. list(nTrees=c(10,20,30,1,2,3))
#
# Returns:
# a dataframe with one row per iteration x product(d in dimList)
#
# expand out the dimensions into a dataframe grid - one row per combination
df <- do.call(expand.grid, dimList);
nRows <- nrow(df);
# add the parameters, and constants, if present
df <- cbind(df, paramList);
if (!missing(constList)) {
df <- cbind(df, constList);
}
# repeat this dataframe for each iteration of the simulation
df <- do.call("rbind",replicate(nSims, df, simplify=FALSE));
# add a new column giving the iteration number ('simId')
simId <- sort(rep(seq(1:nSims),nRows));
df <- cbind(simId, df);
# simulate from the distribution
df[simName] <- do.call(distFn, c(list(n=nrow(df)), df[names(paramList)]))
return(df);
}
Sample usage (using a normal distribution for simplicity only):
dimList <- list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'));
constList <- list(numTrees=c(10,20,30,1,2,3));
paramList <- list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3));
distSim(nSims=3, simName='harvestPerTree', distFn=rnorm, dimList=dimList,
paramList=paramList, constList=constList);
Sample output:
simId farms crops mean sd numTrees harvestPerTree
1 1 farm A apple 10 2 10 9.602849
2 1 farm B apple 20 4 20 20.153225
3 1 farm A pear 30 6 30 25.839825
4 1 farm B pear 5 1 1 1.733120
5 1 farm A durian 10 2 2 13.506135
6 1 farm B durian 15 3 3 11.690910
7 2 farm A apple 10 2 10 7.678611
8 2 farm B apple 20 4 20 22.119560
9 2 farm A pear 30 6 30 31.488002
10 2 farm B pear 5 1 1 5.366725
11 2 farm A durian 10 2 2 6.333747
12 2 farm B durian 15 3 3 13.918085
13 3 farm A apple 10 2 10 10.387194
14 3 farm B apple 20 4 20 21.086388
15 3 farm A pear 30 6 30 34.076926
16 3 farm B pear 5 1 1 6.159415
17 3 farm A durian 10 2 2 8.322902
18 3 farm B durian 15 3 3 9.458085
Note also that you can also define the input values in a nicely indexed way; e.g. if you define
numTrees2 <- array(0, dim=c(length(farms), length(crops)), dimnames=tree_dimList);
numTrees2['Farm A','apple'] = 200;
# etc
Then the way that c(numTrees)
orders its outputs matches expand.grid
, so you can pass constList = list(numTrees=c(numTrees2)
to distSim
.