Locate indices where as.Date fails / not in standard format

https://stackoverflow.com/questions/19759360

r
coercion

03-07-2022
|

Question

I have a character column of dates that I'd like to coerce to Date class:

df$x <- as.Date(df$x)

# Error in charToDate(x)
#   character string is not in a standard unambiguous format

Fine, I'm familiar with this error. I've got something like "" or 90-Smarch-13" in my column. The problem is that head(df$x) looks fine, with normal dates 2013-11-04, so it's not a global problem with my column, but a problem with just a few rows.

My question is:

Can I find out how many rows aren't in this standard unambiguous format?
Can I locate the indices (with a view to inspecting them or dropping them)?

My thoughts:

Use try:

for (i in 1:nrow(df)) try(as.Date$x[i]) # very slow, doesn't finish for 1M rows

Try to guess what the problem is using nchar

head(df[nchar(df$x) != 10 & !is.na(df$x), ]$x)

Are there any more systematic methods?

Solution

I would use parse_date_time from lubridate package, for example:

dates.toparse <- c("2013-11-04","" ,"90-Smarch-13","2012-05-04")
 ## parse dates , I give the correct format here %Y-%m-%d
(dates.parsed <- parse_date_time(dates.toparse,orders="Y-m-d"))
[1] "2013-11-04 UTC" NA               NA               "2012-05-04 UTC"
 ## to locate bad foarmatted elements
 dates.toparse[is.na(dates.parsed)]
[1] ""             "90-Smarch-13"
## or by indices
which(is.na(dates.parsed))
[1] 2 3

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow