Compute data.frame column averages by date

https://stackoverflow.com/questions/23179336

06-07-2023
|

質問

I have a data.frame in R where one column is a list of dates (many of which are duplicates), whereas the other column is a temperature recorded on that date. The columns in question look like this (but is several thousand rows and a few other unnecessary cols):

Date    |    Temp
-----------------
1/2/13     34.4
1/2/13     36.4
1/2/13     34.3
1/4/13     45.6
1/4/13     33.5
1/5/13     45.2

I need to find a way of getting a daily average for temperature. So ideally, I could tell R to loop through the data.frame and for every date that matched, give me an average for the temperature that day. I've been googling and I know loops in R are possible, but I can't wrap my head around this conceptually given what little I know about R code.

I know I can pull out a single column and average it (i.e. mean(data.frame[[2]])) but I'm utterly lost on how to tell R to match that mean to a single value located in the first column.

Additionally, how could I generate an average for every seven calendar days (regardless of how many entries exist for a single day)? So, a seven day rolling average, i.e. if my date range starts at 1/1/13 I'd get an average for all temps taken between 1/1/13 and 1/7/13, and then between 1/8/13 and 1/15/13 and so on...

Any assistance helping me grasp R loops is much appreciated. Thank you!

EDIT

Here's the output of dput(head(my.dataframe)) PLEASE NOTE: I edited down both "date" and "timestamp" because they both go on for several thousand entries otherwise:

structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L, 
101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L, 
7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L, 
34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013", 
"10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013", 
"10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00", 
"10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00", 
"10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00", 
"10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146, 
24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID", 
"SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA, 
6L), class = "data.frame")

解決 2

library(plyr)

ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))

This is a simple example of the Split-Apply-Combine paradigm.

Alternative #1 as Ananda Mahto mentions, dplyr package is a higher-performance rewrite of plyr. He shows the syntax.

Alternative #2: aggregate() is also functionally equivalent, just has fewer bells-and-whistles than plyr/dplyr.

Additionally 'generate average for every 7 calendar days': do you mean 'average-by-week-of-year', or 'moving 7-day average (trailing/leading/centered)'?

他のヒント

Here are a few options:

aggregate(Temp ~ Date, mydf, mean)
#     Date     Temp
# 1 1/2/13 35.03333
# 2 1/4/13 39.55000
# 3 1/5/13 45.20000

library(dplyr)
mydf %.% group_by(Date) %.% summarise(mean(Temp))
# Source: local data frame [3 x 2]
# 
#     Date mean(Temp)
# 1 1/2/13   35.03333
# 2 1/4/13   39.55000
# 3 1/5/13   45.20000

library(data.table)
DT <- data.table(mydf)
DT[, mean(Temp), by = Date]
#      Date       V1
# 1: 1/2/13 35.03333
# 2: 1/4/13 39.55000
# 3: 1/5/13 45.20000

library(xts)
dfX <- xts(mydf$Temp, as.Date(mydf$Date))
apply.daily(dfX, mean)
#             [,1]
# 1-02-13 35.03333
# 1-04-13 39.55000
# 1-05-13 45.20000

Since you are dealing with dates, you should explore the xts package, which will give you access to functions like apply.daily, apply.weekly, apply.monthly and so on which will let you conveniently aggregate your data.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow