Here's how I would do it, using @RichardScriven's dat
:
with(na.omit(dat), aggregate(Number, list(Genus=Genus, Location=Location), sum))
# Genus Location x
# 1 Acidobacterium CC 2
# 2 Edaphobacter CC 0
# 3 Terriglobus CC 1
# 4 Acidobacterium N 12
# 5 Terriglobus N 5
Edit
Given the clarification in your comments on other solutions, I now suggest the following, which calculates, for each Genus
and Location
, the Number
as a proportion of the sum of Number
at the location. Again, starting with @RichardScriven's dat
.
do.call(rbind, lapply(unique(dat$Location), function(x) {
d <- subset(dat, Location==x)
cbind(Location=x, aggregate(d$Number, list(Genus=d$Genus),
function(x) sum(x)/sum(d$Number)))
}))
# Location Genus x
# 1 CC Acidobacterium 0.6666667
# 2 CC Edaphobacter 0.0000000
# 3 CC Terriglobus 0.3333333
# 4 N Acidobacterium 0.7058824
# 5 N Terriglobus 0.2941176
However, if each Genus
only occurs once per Location
, you can simplify to:
lapply(split(dat, list(dat$Location), drop=TRUE), function(x)
transform(x, propn=x$Number/sum(x$Number)))
# $CC
# Genus Location Number propn
# 2 Terriglobus CC 1 0.3333333
# 4 Acidobacterium CC 2 0.6666667
# 6 Edaphobacter CC 0 0.0000000
#
# $N
# Genus Location Number propn
# 3 Terriglobus N 5 0.2941176
# 5 Acidobacterium N 12 0.7058824
This could then be combined into a single data frame with do.call(rbind, x)
, where x
is the list created above.
Finally, you could use dplyr
as follows:
library(dplyr)
dat %.%
group_by(Location) %.%
mutate(total = sum(Number), Propn = Number/total) %.%
select(-total)
# Genus Location Number Propn
# 1 Terriglobus CC 1 0.3333333
# 2 Terriglobus N 5 0.2941176
# 3 Acidobacterium CC 2 0.6666667
# 4 Acidobacterium N 12 0.7058824
# 5 Edaphobacter CC 0 0.0000000