Question

I am having trouble calculating a date that is imported in from a .csv file. What I want to do is take that date in the factor DateClosed and generate a date in a date field (a). Example if a=203 I want the date to be the equivalent of DateClosed-203. However, I am having trouble with the code listed below.

DateClose is a factor.

> head(DateClosed)
[1] 7/30/2007  12/12/2007 5/8/2009   6/24/2009  6/24/2009  2/29/2008 
165 Levels: 1/12/2010 1/15/2011 1/15/2013 1/17/2009 1/18/2008 1/19/2012 1/2/2013 1/21/2013 1/22/2010 1/24/2013 1/26/2014 ... 9/7/2010
> head(as.Date(DateClosed,format="%m/%d/%y"))
[1] "2020-07-30" "2020-12-12" "2020-05-08" "2020-06-24" "2020-06-24" "2020-02-29"

 head(as.Date(DateClosed,format="%m/%d/%y"))-203
[1] "2020-01-09" "2020-05-23" "2019-10-18" "2019-12-04" "2019-12-04" "2019-08-10"

It subtracts 203 days correctly but for some reason reads the date wrong.

Was it helpful?

Solution

DateClosed <- factor(c("7/30/2007","12/12/2007", "5/8/2009"))
as.Date(DateClosed, format="%m/%d/%Y")

Produces:

[1] "2007-07-30" "2007-12-12" "2009-05-08"

Notice the capital "Y" in the format param. The lower case "y" is for 2 digit years, so as.Date reads the first two digits of the year token ("20"), and then assumes that refers to just the last two digits of the year, and adds the current date's century (also "20"), so you end up with dates in 2020.

OTHER TIPS

Manipulating dates becomes really easy using lubridate package.

mdy(factor(c("7/30/2007","12/12/2007", "5/8/2009")))

"2007-07-30 UTC" "2007-12-12 UTC" "2009-05-08 UTC"

Or using parse_date_time with the same package:

parse_date_time(factor(c("7/30/2007","12/12/2007", "5/8/2009")),c('mdY'))
[1] "2007-07-30 UTC" "2007-12-12 UTC" "2009-05-08 UTC"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top