Pergunta

I have a csv download of data from a Management Information system. There are some variables which are dates and are written in the csv as strings of the format "2012/11/16 00:00:00".

After reading in the csv file, I convert the date variables into a date using the function as.Date(). This works fine for all variables that do not contain any blank items.

For those which do contain blank items I get the following error message: "character string is not in a standard unambiguous format"

How can I get R to replace blank items with something like "0000/00/00 00:00:00" so that the as.Date() function does not break? Are there other approaches you might recommend?

Foi útil?

Solução

If they're strings, does something as simple as

mystr <- c("2012/11/16 00:00:00","   ","")
mystr[grepl("^ *$",mystr)] <- NA
as.Date(mystr)

work? (The regular expression "^ *$" looks for strings consisting of the start of the string (^), zero or more spaces (*), followed by the end of the string ($). More generally I think you could use "^[[:space:]]*$" to capture other kinds of whitespace (tabs etc.)

Outras dicas

Even better, have the NAs correctly inserted when you read in the CSV:

read.csv(..., na.strings='')

or to specify a vector of all the values which should be read as NA...

read.csv(..., na.strings=c('','  ','   '))
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top