Create array of start/end of day datetimes in R
-
28-10-2019 - |
سؤال
I'm using R to do some time series analysis using zoo and chron. I've got a zoo object with lots of data in it, and need to be able to use the window
function to subset the data to just one days worth, then the next days worth, then the next etc.
I've tried to find the easiest way of creating an array with the date of each day in a certain period in it and have come up with the following:
orig = c(month=1, day=1, year=2005)
dates <- chron(1:1825, origin=orig, out.format=c(dates="d/m/y", times="h:m"))
This uses the Julian day notation, and has 1825 days (365*5 - so five years), starting with the first day of my date period. I then try and do a for loop using each of the elements of this array:
for (date in dates)
{
s = chron(date, "00:00:00", origin=orig)
e = chron(date, "23:59:59", origin=orig)
aeronet_day = window(aeronet, start=s, end=e)
}
However, this gives me a warning saying that I'm using different origins for the aeronet
zoo object and the s
and e
variables, and it doesn't select any data.
Is there a better way to do this? Or a way to fix this? Basically what I want is to run a for loop where in the loop I can use the aeronet_day = window(aeronet, start=s, end=e)
code to produce a zoo object containing the data for one day (eg. 1st May 2005 from 00:00:00 to 23:59:59.
المحلول
Suppose we have this data:
# create test data
library(zoo)
library(chron)
z <- zooreg(1:30, start = chron("2000-01-01"), freq = 2)
1) aggregate
The R aggregate
function has a zoo method. The second argument is what we aggregate by. If it is a function it is applied to the index of the zoo object. e.g. here we calculate the mean for each date:
z.ag <- aggregate(z, as.Date, mean)
We can replace mean
with a more complex function if we wish.
2) split. The R split
function has a zoo method. If we really do want to split z
by date then we can do this. Here z.split.list
is a list, each of whose components contains the zoo object for one date.
z.split.list <- split(z, as.Date(time(z)))
Now (a) sapply
or (b) lapply
over that list or (c) use the following (replacing print(zc)
with whatever processing is desired). Here zc
is a component of the list, i.e. it is the zoo object formed by just taking a particular date:
for(zc in z.split.list) print(zc)
Note that as.Date(time(z))
is a vector with the dates corresponding to the elements of z.
EDIT:
Various minor elaborations.
نصائح أخرى
I'm not familiar with zoo, but I usually just convert the date to a numeric, then make the sequence, and then convert back again. For example:
> as.Date(Sys.Date():(Sys.Date()+365), origin='1970-01-01')
[1] "2011-12-06" "2011-12-07" "2011-12-08" "2011-12-09" "2011-12-10" "2011-12-11" "2011-12-12" "2011-12-13"
[9] "2011-12-14" "2011-12-15" "2011-12-16" "2011-12-17" "2011-12-18" "2011-12-19" "2011-12-20" "2011-12-21"
[17] "2011-12-22" "2011-12-23" "2011-12-24" "2011-12-25" "2011-12-26" "2011-12-27" "2011-12-28" "2011-12-29"
[25] "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06"
[33] "2012-01-07" "2012-01-08" "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12" "2012-01-13" "2012-01-14"
[41] "2012-01-15" "2012-01-16" "2012-01-17" "2012-01-18" "2012-01-19" "2012-01-20" "2012-01-21" "2012-01-22"
...
If you want to do something on a per date basis, then what you have is fine.
Some sample aeronet
data.
last_date <- 1825
n <- 10000
aeronet <- data.frame(
some.value = seq_len(n),
date = as.chron(
runif(n, 0, last_date),
origin = orig,
out.format = c(dates = "d/m/y", times = "h:m")
)
)
Now you can split the data by date using split
, or apply a function to each date with tapply
or ddply
from plyr
(or use aggregate
or whatever).
with(aeronet, split(some.value, date))
with(aeronet, tapply(some.value, date, sum))
library(plyr)
ddply(aeronet, .(date), summarise, sum(some.value))