Question

I'm using R to do some time series analysis using zoo and chron. I've got a zoo object with lots of data in it, and need to be able to use the window function to subset the data to just one days worth, then the next days worth, then the next etc.

I've tried to find the easiest way of creating an array with the date of each day in a certain period in it and have come up with the following:

orig = c(month=1, day=1, year=2005)
dates <- chron(1:1825, origin=orig, out.format=c(dates="d/m/y", times="h:m"))

This uses the Julian day notation, and has 1825 days (365*5 - so five years), starting with the first day of my date period. I then try and do a for loop using each of the elements of this array:

for (date in dates)
{
  s = chron(date, "00:00:00", origin=orig)
  e = chron(date, "23:59:59", origin=orig)

  aeronet_day = window(aeronet, start=s, end=e)
}

However, this gives me a warning saying that I'm using different origins for the aeronet zoo object and the s and e variables, and it doesn't select any data.

Is there a better way to do this? Or a way to fix this? Basically what I want is to run a for loop where in the loop I can use the aeronet_day = window(aeronet, start=s, end=e) code to produce a zoo object containing the data for one day (eg. 1st May 2005 from 00:00:00 to 23:59:59.

Was it helpful?

Solution

Suppose we have this data:

# create test data
library(zoo)
library(chron)
z <- zooreg(1:30, start = chron("2000-01-01"), freq = 2)

1) aggregate The R aggregate function has a zoo method. The second argument is what we aggregate by. If it is a function it is applied to the index of the zoo object. e.g. here we calculate the mean for each date:

z.ag <- aggregate(z, as.Date, mean)

We can replace mean with a more complex function if we wish.

2) split. The R split function has a zoo method. If we really do want to split z by date then we can do this. Here z.split.list is a list, each of whose components contains the zoo object for one date.

z.split.list <- split(z, as.Date(time(z)))

Now (a) sapply or (b) lapply over that list or (c) use the following (replacing print(zc) with whatever processing is desired). Here zc is a component of the list, i.e. it is the zoo object formed by just taking a particular date:

for(zc in z.split.list) print(zc)

Note that as.Date(time(z)) is a vector with the dates corresponding to the elements of z.

EDIT:

Various minor elaborations.

OTHER TIPS

I'm not familiar with zoo, but I usually just convert the date to a numeric, then make the sequence, and then convert back again. For example:

> as.Date(Sys.Date():(Sys.Date()+365), origin='1970-01-01')
  [1] "2011-12-06" "2011-12-07" "2011-12-08" "2011-12-09" "2011-12-10" "2011-12-11" "2011-12-12" "2011-12-13"
  [9] "2011-12-14" "2011-12-15" "2011-12-16" "2011-12-17" "2011-12-18" "2011-12-19" "2011-12-20" "2011-12-21"
 [17] "2011-12-22" "2011-12-23" "2011-12-24" "2011-12-25" "2011-12-26" "2011-12-27" "2011-12-28" "2011-12-29"
 [25] "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06"
 [33] "2012-01-07" "2012-01-08" "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12" "2012-01-13" "2012-01-14"
 [41] "2012-01-15" "2012-01-16" "2012-01-17" "2012-01-18" "2012-01-19" "2012-01-20" "2012-01-21" "2012-01-22"
...

If you want to do something on a per date basis, then what you have is fine.

Some sample aeronet data.

last_date <- 1825
n <- 10000
aeronet <- data.frame(
  some.value = seq_len(n), 
  date = as.chron(
    runif(n, 0, last_date), 
    origin = orig,
    out.format = c(dates = "d/m/y", times = "h:m")
  )
)

Now you can split the data by date using split, or apply a function to each date with tapply or ddply from plyr (or use aggregate or whatever).

with(aeronet, split(some.value, date))
with(aeronet, tapply(some.value, date, sum))

library(plyr)
ddply(aeronet, .(date), summarise, sum(some.value))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top