Question

Is there a good package in R that allows to sub-set (i.e. index into) timeseries by times that are not in the time series? E.g. for financial applications, indexing a price series by a time stamp that is not in the database, should return the latest available price before the time stamp.

in code, this is what I would like

n =15
full.dates = seq(Sys.Date(), by = 'day', length = n)
series.dates = full.dates[c(1:10, 12, 15)] 
require(zoo)
series=zoo(rep(1,length(series.dates)), series.dates)
series[full.dates[11]]

this returns

Data:
numeric(0)

Index:
character(0)

however, I would like this to return the value of the last existing date before full.dates[11], which is full.dates[10]:

series[full.dates[10]]
2014-01-03 
     1 

Thanks

Was it helpful?

Solution 2

na.locf(x, xout = newdate) seems not much worse than subscripting but at any rate here we define a subclass of "zoo" called "zoo2" in which [ uses na.locf. This is an untested minimal implementation but it could be extended:

as.zoo2 <- function(x) UseMethod("as.zoo2")
as.zoo2.zoo <- function(x) structure(x, class = c("zoo2", setdiff(class(x), "zoo2")))
"[.zoo2" <- function(x, i, ...) {
    if (!missing(i) && inherits(i, class(index(x)))) {
        zoo:::`[.zoo`(na.locf(x, xout = i),, ...)
    } else as.zoo2(zoo:::`[.zoo`(x, i, ...))
}

This gives:

> series2 <- as.zoo2(series)
> series2[full.dates[11]]
2014-01-04 
         1 

OTHER TIPS

You can use index to extract index of the observations in your zoo object. The index can then be used for subsetting the object. Step by step to show the logic (you only need the last step, if I have understood you correctly):

# the index of the observations, here dates
index(series)

# are the dates smaller than your reference date?
index(series) < full.dates[11]

# subset observations: dates less than reference date
series[index(series) < full.dates[11]]

# select last observation before reference date:
tail(series[index(series) < full.dates[11]], 1)

# 2014-01-03 
#          1

A possible alternative may be to expand your time series and "replac[e] each NA with the most recent non-NA" using na.locf and the xout argument (see also ?na.locf and ?approx and this answer)

# expand time series to the range of dates in 'full.dates'
series2 <- na.locf(series, xout = full.dates)
series2

# select observation at reference date
series2[full.dates[10]]
# 2014-01-03 
#          1

If you rather want missing values in your incomplete series to be replaced by "next observation carried backward", you need to merge your series with with a 'dummy' zoo object which contains the desired range of consecutive dates.

series3 <- merge(series, zoo(, full.dates))
na.locf(series3, fromLast = TRUE)

I would strongly argue that subset functions should not return the prior row if the desired index value does not exist. Subset functions should return what the user requested; they should not assume the user wanted something different than what they requested.

If this is what you want, you can handle it fairly easily with an if statement.

series.subset <- series[full.dates[11]]
if(NROW(series.subset)==0) {
  # merge series with an empty zoo object
  # that contains the index value you want
  prior <- merge(series, zoo(,full.dates[11]))
  # lag *back* one period so the NA is on the prior value
  prior <- lag(prior, 1)
  # get the index value at the prior value
  prior <- index(prior)[is.na(prior)]
  # subset again
  series.subset <- series[prior]
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top