Question

I want to calculate the mean for each "Day" but for a portion of the day (Time=12-14). This code works for me but I have to enter each day as a new line of code, which will amount to hundreds of lines.

This seems like it should be simple to do. I've done this easily when the grouping variables are the same but dont know how to do it when I dont want to include all values for the day. Is there a better way to do this?

sapply(sap[sap$Day==165 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean)

sapply(sap[sap$Day==166 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean)

Here's what the data looks like:

Day Time    StomCond_Trunc
165 12      33.57189926
165 12.1    50.29437636
165 12.2    35.59876214
165 12.3    24.39879768
Was it helpful?

Solution

Try this:

aggregate(StomCond_Trunc~Day,data=subset(sap,Time>=12 & Time<=14),mean)

OTHER TIPS

If you have a large dataset, you may also want to look into the data.table package. Converting a data.frame to a data.table is quite easy.

Example:

Large(ish) dataset

df <- data.frame(Day=1:1000000,Time=sample(1:14,1000000,replace=T),StomCond_Trunc=rnorm(100000)*20)

Using aggregate on the data.frame

>system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean))
   user  system elapsed 
 16.255   0.377  24.263

Converting it to a data.table

 dt <- data.table(df,key="Time")

>system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day])
   user  system elapsed 
  9.534   0.178  15.270 

Update from Matthew. This timing has improved dramatically since originally answered due to a new optimization feature in data.table 1.8.2.

Retesting the difference between the two approaches, using data.table 1.8.2 in R 2.15.1 :

df <- data.frame(Day=1:1000000,
                 Time=sample(1:14,1000000,replace=T),
                 StomCond_Trunc=rnorm(100000)*20)
system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean)) 
#   user  system elapsed 
#  10.19    0.27   10.47

dt <- data.table(df,key="Time") 
system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day]) 
#   user  system elapsed 
#   0.31    0.00    0.31 

Using your original method, but with less typing:

sapply(sap[sap$Day==165 & sap$Time %in% seq(12, 14, 0.1), ],mean)

However this is only a slightly better method than your original one. It's not as flexible as the other answers since it depends on 0.1 increments in your time values. The other methods don't care about the increment size, which makes them more versatile. I'd recommend @Maiasaura's answer with data.table

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top