Pergunta

I have a data frame spanning a couple of months, split into several values per day. I have assigned one column as a date (using as.Date() ), but is it possible to run operations over data for a specific day?

For example, I want to run summary() on 29-04-14 and return only a summary for this particular date.

         Date     Time   col3   col4
6084 28-04-14 23:42:30   0.48 164.00
6085 28-04-14 23:47:00   0.39 150.00
6086 28-04-14 23:51:30   0.41 164.00
6087 28-04-14 23:56:00   0.33 160.00
6088 29-04-14 00:00:00   0.32 138.00
6089 29-04-14 00:04:30   0.15 141.00
6090 29-04-14 00:09:00   0.12 133.00

I have been trying for hours and installed xts, but have had no success :(

Foi útil?

Solução

Here's a possibility to get you started with the "xts" package.

Start with some sample data, as a data.frame:

mydf <- structure(list(Date = c("28-04-14", "28-04-14", "28-04-14", "28-04-14",
                                "29-04-14", "29-04-14", "29-04-14"), 
                       Time = c("23:42:30", "23:47:00", "23:51:30", "23:56:00", 
                                "00:00:00", "00:04:30", "00:09:00"), 
                       col3 = c(0.48, 0.39, 0.41, 0.33, 0.32, 0.15, 0.12), 
                       col4 = c(164, 150, 164, 160, 138, 141, 133)), 
                  .Names = c("Date", "Time", "col3", "col4"), 
                  class = "data.frame", row.names = c(NA, -7L))

Convert it to an "xts" object:

library(xts)
myxts <- xts(mydf[-c(1, 2)], 
             as.POSIXct(paste(as.Date(mydf$Date, format="%d-%m-%y"), 
                              mydf$Time), tz=""))

It now looks like this:

myxts
#                     col3 col4
# 2014-04-28 23:42:30 0.48  164
# 2014-04-28 23:47:00 0.39  150
# 2014-04-28 23:51:30 0.41  164
# 2014-04-28 23:56:00 0.33  160
# 2014-04-29 00:00:00 0.32  138
# 2014-04-29 00:04:30 0.15  141
# 2014-04-29 00:09:00 0.12  133

The "xts" package has a lot of convenient time-based functions, like apply.daily, apply.weekly, and so on.

apply.daily(myxts, mean)
#                          col3     col4
# 2014-04-28 23:56:00 0.4025000 159.5000
# 2014-04-29 00:09:00 0.1966667 137.3333

It also lets you conveniently extract a specific date or time period:

myxts["2014-04-29"]
#                     col3 col4
# 2014-04-29 00:00:00 0.32  138
# 2014-04-29 00:04:30 0.15  141
# 2014-04-29 00:09:00 0.12  133

Here's what summary looks like for that particular date:

summary(myxts["2014-04-29"])
#      Index                          col3             col4      
#  Min.   :2014-04-29 00:00:00   Min.   :0.1200   Min.   :133.0  
#  1st Qu.:2014-04-29 00:02:15   1st Qu.:0.1350   1st Qu.:135.5  
#  Median :2014-04-29 00:04:30   Median :0.1500   Median :138.0  
#  Mean   :2014-04-29 00:04:30   Mean   :0.1967   Mean   :137.3  
#  3rd Qu.:2014-04-29 00:06:45   3rd Qu.:0.2350   3rd Qu.:139.5  
#  Max.   :2014-04-29 00:09:00   Max.   :0.3200   Max.   :141.0

Outras dicas

You can treat your date like any other factor for the most part. If you want to run summary for each date, you can use the standard by function

by(data, data$Date, summary)

Date ranges are a bit more tricky. I might recommend using cut.Date or seq.Date to define data ranges and again using those as factors to collapse into groups you are interested in.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top