Frage

When I put a single date to be parsed, it parses accurately

> ymd("20011001")
[1] "2001-10-01 UTC"

But when I try to create a vector of dates they all come out one day off:

> b=c(ymd("20111001"),ymd("20101001"),ymd("20091001"),ymd("20081001"),ymd("20071001"),ymd("20061001"),ymd("20051001"),ymd("20041001"),ymd("20031001"),ymd("20021001"),ymd("20011001"))
> b
 [1] "2011-09-30 19:00:00 CDT" "2010-09-30 19:00:00 CDT" "2009-09-30 19:00:00 CDT"
 [4] "2008-09-30 19:00:00 CDT" "2007-09-30 19:00:00 CDT" "2006-09-30 19:00:00 CDT"
 [7] "2005-09-30 19:00:00 CDT" "2004-09-30 19:00:00 CDT" "2003-09-30 19:00:00 CDT"
[10] "2002-09-30 19:00:00 CDT" "2001-09-30 19:00:00 CDT"

how can I fix this??? Many thanks.

War es hilfreich?

Lösung

I don't claim to understand exactly what's going on here, but the proximal problem is that c() strips attributes, so using c() on a POSIX[c?]t vector changes it from UTC to the time zone specified by your locale strips the time zone attribute, messing it up (even if you set the time zone to agree with the one specified by your locale). On my system:

library(lubridate)
(y1 <- ymd("20011001"))
## [1] "2001-10-01 UTC"
(y2 <- ymd("20011002"))
c(y1,y2)
## now in EDT (and a day earlier/4 hours before UTC):
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
(y12 <- ymd(c("20011001","20011002")))
## [1] "2001-10-01 UTC" "2001-10-02 UTC"
c(y12)
## back in EDT
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"

You can set the time zone explicitly ...

y3 <- ymd("20011001",tz="EDT")
## [1] "2001-10-01 EDT"

But c() is still problematic.

(y3c <- c(y3))
## [1] "2001-09-30 20:00:00 EDT"

So two solutions are

  • convert a character vector rather than combining the objects after converting them one by one or
  • restore the tzone attribute after combining.

For example:

 attr(y3c,"tzone") <- attr(y3,"tzone")

@Joran points out that this is almost certainly a general property of applying c() to POSIX[c?]t objects, not specifically lubridate-related. I hope someone will chime in and explain whether this is a well-known design decision/infelicity/misfeature.

Update: there is some discussion of this on R-help in 2012, and Brian Ripley comments:

But in any case, the documentation (?c.POSIXct) is clear:

  Using ‘c’ on ‘"POSIXlt"’ objects converts them to the current time
  zone, and on ‘"POSIXct"’ objects drops any ‘"tzone"’ attributes
  (even if they are all marked with the same time zone).

So the recommended way is to add a "tzone" attribute if you know what you want it to be. POSIXct objects are absolute times: the timezone merely affects how they are converted (including to character for printing).

It might be nice if lubridate added a method to do this ...

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top