Question

So what I have is data of cod weights at different ages. This data is taken at several locations over time.

What I would like to create is "weight at age", basically a mean value of weights at a certain age. I want do this for each location at each year. However, the ages are not sampled the same way (all old fish caught are measured, while younger fish are sub sampled), so I can't just create a normal average, I would like to bootstrap samples.

The bootstrap should take out 5 random values of weight at an age, create a mean value and repeat this a 1000 times, and then create an average of the means. The values should be able to be used again (replace). This should be done for each age at every AreaCode for every year. Dependent factors: Year-location-Age.

So here's an example of what my data could look like.

df <- data.frame( Year= rep(c(2000:2008),2), AreaCode = c("39G4", "38G5","40G5"), Age = c(0:8), IndWgt = c(rnorm(18, mean=5, sd=3)))
> df
   Year AreaCode Age       IndWgt
1  2000     39G4   0  7.317489899
2  2001     38G5   1  7.846606144
3  2002     40G5   2  0.009212455
4  2003     39G4   3  6.498688035
5  2004     38G5   4  3.121134937
6  2005     40G5   5 11.283096043
7  2006     39G4   6  0.258404136
8  2007     38G5   7  6.689780137
9  2008     40G5   8 10.180511929
10 2000     39G4   0  5.972879108
11 2001     38G5   1  1.872273650
12 2002     40G5   2  5.552962065
13 2003     39G4   3  4.897882549
14 2004     38G5   4  5.649438631
15 2005     40G5   5  4.525012587
16 2006     39G4   6  2.985615831
17 2007     38G5   7  8.042884181
18 2008     40G5   8  5.847629941

AreaCode contains the different locations, in reality I have 85 different levels. The time series stretches 1991-2013, the ages 0-15. IndWgt contain the weight. My whole data frame has a row length of 185726.

Also, every age does not exist for every location and every year. Don't know if this would be a problem, just so the scripts isn't based on references to certain row number. There are some NA values in the weight column, but I could just remove them before hand.

I was thinking that I maybe should use replicate, and apply or another plyr function. I've tried to understand the boot function but I don't really know if I would write my arguments under statistics, and in that case how. So yeah, basically I have no idea.

I would be thankful for any help I can get!

Was it helpful?

Solution

How about this with plyr. I think from the question you wanted to bootstrap only the "young" fish weights and use actual means for the older ones. If not, just replace the ifelse() statement with its last argument.

require(plyr)
#cod<-read.csv("cod.csv",header=T) #I loaded your data from csv

bootstrap<-function(Age,IndWgt){
  ifelse(Age>2,      # treat differently for old/young fish
         res<-mean(IndWgt),           # old fish mean
         res<-mean(replicate(1000,sample(IndWgt,5,replace = TRUE))) # young fish bootstrap
         )
  return(res)
}

ddply(cod,.(Year,AreaCode,Age),summarize,boot_mean=bootstrap(Age,IndWgt))

  Year AreaCode Age boot_mean
1 2000     39G4   0  6.650294
2 2001     38G5   1  4.863024
3 2002     40G5   2  2.724541
4 2003     39G4   3  5.698285
5 2004     38G5   4  4.385287
6 2005     40G5   5  7.904054
7 2006     39G4   6  1.622010
8 2007     38G5   7  7.366332
9 2008     40G5   8  8.014071

PS: If you want to sample all ages in the same way, no need for the function, just:

ddply(cod,.(Year,AreaCode,Age),
      summarize,
      boot_mean=mean(replicate(1000,mean(sample(IndWgt,5,replace = TRUE)))))

OTHER TIPS

Since you don't provide enough code, it's too hard (lazy) for me to test it properly. You should get your first step using the following code. If you wrap this into replicate, you should get your end result that you can average.

part.result <- aggregate(IndWgt ~ Year + AreaCode + Age, data = data, FUN = function(x) {
  rws <- length(x)
  get.em <- sample(x, size = 5, replace = TRUE)
  out <- mean(get.em)
  out
})

To handle any missing combination of year/age/location, you could probably add an if statement checking for NULL/NA and producing a warning and/or skipping the iteration.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top