Domanda

I have a data frame in R that looks like this:

species sampletype content
 P1    O1         10
 P1    O2         12
 P1    O3         9
 P1    A          4
 P1    A          3
 P1    A          4
 P2    O1         21 
 P2    O1         12
 P2    O2         4
 P2    O3         6
 P2    A          7
 P2    A          7
 P2    A          3
 P3    O1         15 
 P3    O1         13
 P3    O1         5
 P3    O1         12
 P3    A          5
 P3    A          7
 P3    A          8
 P4    O1         12 
 P4    O1         11
 P4    O2         8
 P4    O2         2
 P4    A          4
 P4    A          3
 P4    A          4

Now I need the average content of O* samples per species, where O1, O2 and O3 are separate samples, but repeated occurrence of for example O1 counts as one O1 (respectively for O2 and O3). So the result should deliver something like this:

P1 = (10+12+9)/3
P2 = (21+12+4+6)/3   (since there is O1,O2 and O3)
P3 = (15+13+5+12)/1  (since only O1 occurs)
P4 = (12+11+8+2)/2   (since only O1 and O2 occur)

I have tried it with merge, aggregate, grep.. but I struggle with the syntax and the complexity.

È stato utile?

Soluzione

If I understand you correctly you don't need rows where sampletype equals A. Given this is correct you may do

d <- subset(x, sampletype != "A")
ddply(d, .(species), summarise, 
      avg=sum(content) / length(unique(sampletype)))

  species      avg
1      P1 10.33333
2      P2 14.33333
3      P3 45.00000
4      P4 16.50000
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top