How to replace a column in R? strange behavior with dates
Question
I am trying to convert a uncommon date format into a standard date. Basically I have a dataset that contains a period with semiannual frequency formatted like: 206 denoting the second half of 2006, 106 denoting the first half and so forth. In order to rearrange it to 2006-06-01 respectively 2006-01-01, i have written a small function:
period2date = function(period)
{
check=list()
check=strsplit(as.character(period),split="")
x=as.numeric(check[[1]][1])
p=ifelse( x >= 2,6,1)
x=2
out=paste(x,"0",check[[1]][2],check[[1]][3],"-",p,"-1",sep="")
out=as.Date(out)
return(out)
}
you may laugh now :) . Anyway, that function works and here comes the problem. I want to apply this function to the time column of data.frame. I tried the following:
as.data.frame(lapply(mydf$period,period2date))
which returned the result closest to what I want: structure.13665..class....Date.. 1 2006-06-01
and so forth.. obviously i´d love to keep the name of my column – or even better just add the newly formatted date to my original df. Plus I tried:
sapply(mydf$period,period2date) # with results equal to the line below
unlist(lapply(mydf$period,period2date))
[1] 13300 13514 13665
All I want to do is change the uncommon 206 etc. format to 2006-06-01 (which works) and add a column to mydf (which does not work)
thx for any suggestions in advance!
Solution
R
stores dates as numbers, so I think you're getting some wacky behavior because you're operating on the date output (i.e., putting the dates back into a matrix, which makes them appear as the numbers they really are). Instead, you should explicitly use a data.frame with data.frame()
. Also, you may save some time if you use vectorized operations (I think the apply
family still uses loops):
period2date <- function(period) {
period <- as.character(period)
half <- substr(period, 1, 1)
year <- substr(period, 2, 3)
dates <- as.Date(ifelse(half=="1", paste(year, "0101", sep=""), paste(year, "0701", sep="")), format="%y%m%d")
return(dates)
}
data <- data.frame(data, period2date(data$dates))
You can make this cleaner by replacing vice appending the period/date column, also.
OTHER TIPS
This is strange...:
as.Date(sapply(mydf$period,period2date))
returns "2006-06-01" "2006-01-01" etc. I am stunned because the period2date function already contains as.Date(). This is a solution to my problem, but I don´t understand it completely...