Question

I have a character column of dates that I'd like to coerce to Date class:

df$x <- as.Date(df$x)

# Error in charToDate(x)
#   character string is not in a standard unambiguous format

Fine, I'm familiar with this error. I've got something like "" or 90-Smarch-13" in my column. The problem is that head(df$x) looks fine, with normal dates 2013-11-04, so it's not a global problem with my column, but a problem with just a few rows.

My question is:

  1. Can I find out how many rows aren't in this standard unambiguous format?
  2. Can I locate the indices (with a view to inspecting them or dropping them)?

My thoughts:

  1. Use try:

for (i in 1:nrow(df)) try(as.Date$x[i]) # very slow, doesn't finish for 1M rows

  1. Try to guess what the problem is using nchar

head(df[nchar(df$x) != 10 & !is.na(df$x), ]$x)

Are there any more systematic methods?

Was it helpful?

Solution

I would use parse_date_time from lubridate package, for example:

dates.toparse <- c("2013-11-04","" ,"90-Smarch-13","2012-05-04")
 ## parse dates , I give the correct format here %Y-%m-%d
(dates.parsed <- parse_date_time(dates.toparse,orders="Y-m-d"))
[1] "2013-11-04 UTC" NA               NA               "2012-05-04 UTC"
 ## to locate bad foarmatted elements
 dates.toparse[is.na(dates.parsed)]
[1] ""             "90-Smarch-13"
## or by indices
which(is.na(dates.parsed))
[1] 2 3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top