Pergunta

I have a data frame with roughly 8 million rows and 3 columns. I used strptime() in the following manner:

df$date.time <- strptime(df$date.time, "%m/%d/%y %I:%M:%S %p")

This works fine for all but 1104 of the rows, which I checked using

df[is.na(df$date.time), ]

When I look at these "problem" data, the date.time entries seem to be formatted in the way I would expect. For example, here is an observation that comes up as a problem, but doesn't appear to be an NA:

id                date.time              outcome
observation543490 2012-03-11 02:14:01    C

What could possibly be going on here that is.na(df$date.time) returns a TRUE value for this row that has apparently been converted correctly?

Here's a reproducible example (if you're in CST):

is.na(strptime("03/11/12 2:14:01 AM", "%m/%d/%y %I:%M:%S %p", "CST6CDT"))
#[1] TRUE
Foi útil?

Solução

The problem is likely that all the times that return NA do not exist in whatever timezone you're using, due to daylight saving time.

Check with the data source to determine the timezone the data were recorded in, then set the tz argument to that value in your call to strptime.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top