I have a dataset which contains data on the abundance of an organism and the sediment mud content % in which it was found.

I have subsequently partitioned the mud content data into 10 bins (i.e. 0 - 10%, 10.1 - 20% etc) and placed the abundance data into each bin accordingly.

The primary aim is to plot the maximum abundance in each mud bin over the mud gradient (i.e. 0 - 100 %) but for these maximums to be weighted by the number of samples in each bin.

So, my question is how to weight the maximum abundance in a given mud bin by the number of samples in each bin?

Here is an simple subset of my data:

Mud % bins: |     0 - 9      |     9.1 - 18      |     18.1 - 27    |
Abundance:   10,10,2,2,2,1,1      15,15,15,2      20,20,20,1,1,1,1,1
有帮助吗?

解决方案

You can use ddply from plyr package for that. In the following code,wtdabundance is your weighted abundance= (max of a bin*number of observation of that bin)/total observation For your sample data,

mydata<-structure(list(id = 1:19, bin = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0-9", 
"18.1-27", "9.1-18"), class = "factor"), abundance = c(10L, 10L, 
2L, 2L, 2L, 1L, 1L, 15L, 15L, 15L, 2L, 20L, 20L, 20L, 1L, 1L, 
1L, 1L, 1L)), .Names = c("id", "bin", "abundance"), class = "data.frame", row.names = c(NA, 
-19L))
> mydata
   id     bin abundance
1   1     0-9        10
2   2     0-9        10
3   3     0-9         2
4   4     0-9         2
5   5     0-9         2
6   6     0-9         1
7   7     0-9         1
8   8  9.1-18        15
9   9  9.1-18        15
10 10  9.1-18        15
11 11  9.1-18         2
12 12 18.1-27        20
13 13 18.1-27        20
14 14 18.1-27        20
15 15 18.1-27         1
16 16 18.1-27         1
17 17 18.1-27         1
18 18 18.1-27         1
19 19 18.1-27         1


 ddply(dat,.(bin), summarize, max.abundance=max(abundance), freq=length(bin),mwtdabundance=((max.abundance*freq/nrow(dat))))
      bin max.abundance freq mwtdabundance
1     0-9            10    7      3.684211
2 18.1-27            20    8      8.421053
3  9.1-18            15    4      3.157895

其他提示

aggregate solution:

If you data looks like:

dat <- data.frame(
  bin=rep(c("0-9","9.1-18","18.1-27"),c(7,4,8)),
  abundance=c(10,10,2,2,2,1,1,15,15,15,2,20,20,20,1,1,1,1,1)
)

       bin abundance
1      0-9        10
...
8   9.1-18        15
...
12 18.1-27        20

Then:

aggregate(abundance ~ bin,data=dat,FUN=function(x) max(x) * length(x)/nrow(dat))

      bin abundance
1     0-9  3.684211
2 18.1-27  8.421053
3  9.1-18  3.157895
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top