Frage

I have a dataset with a date as a factor

I tried using the lubridate package to extract the year and the month in order to create a new column in my data.frame, but it doesn't work.

    #load packages
    library(lubridate)

    #Create Dataset
    Data <- read.csv("C:/Users/TheKaspa/Dropbox/Bocconi/LM - Management/Tesi/WIP/Database/Elab.csv", header=TRUE)

#Get the year
Y <- year(Data$Activity_close)
Y

The result is

[1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [26]  1  1  1  1  1  1  1  1  1  1  1  1 31  1  1  1  1  1  1  1  1  1  1  1  1
 [51]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [76]  1  1  1  1  1  1 31 31  1  1  1  1 31 31 31 31 31  1  1  1  1  1  1  1  1
[101]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 31 31 31 31
[126] 31 31 31  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[151]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[176]  1  1  1  1  1  1  1  1  1 31 31  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[201]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[226]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[251]  1  1 31  1 31  1 31  1  1  1  1 31  1  1  1 31 31 31 31 31 31 31  1  1  1
[276]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 31 31
[301] 31 31  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[326]  1  1  1  1  1  1

but the data is (sample from summary)

1/12/2010 00:00:00
1/5/2010 00:00:00
1/6/2010 00:00:00
1/12/2011 00:00:00 
1/5/2011 00:00:00  
1/10/2010 00:00:00

What can I do?

War es hilfreich?

Lösung 2

Convert string to datetime using as.Date. Use strftime to extract only the year element.

    time <- "1/12/2010 00:00:00"

    timeformatted <- as.Date(time,"%d/%m/%Y %H:%M:%S")

    strftime(
        timeformatted,
        "%Y"
    )
    #[1] "2010"

Andere Tipps

You should transform your factor to a constant date before extracting date elements. For example here , I am using dmy_hms:

library(lubridate)
year(dmy_hms('1/12/2010 00:00:00'))
month(dmy_hms('1/12/2010 00:00:00'))

Note also that no need to use lubridate, you can achieve the same thing in base R :

as.POSIXlt('1/12/2010 00:00:00',format='%d/%m/%Y %H:%M:%S')

You need to format your dates as proper dates.

Dates <- readLines(textConnection("1/12/2010 00:00:00
1/5/2010 00:00:00
1/6/2010 00:00:00
1/12/2011 00:00:00 
1/5/2011 00:00:00  
1/10/2010 00:00:00"))

library(lubridate)

year(Dates)
# [1] 1 1 1 1 1 1
ProperDates <- as.POSIXct(Dates, format="%d/%m/%Y %H:%M:%S")
year(ProperDates)
# [1] 2010 2010 2010 2011 2011 2010

See also

?strptime
?as.POSIXct

-- I was about to add that you probably need to make sure that the input is a character vector rather than a factor but as.POSIXct seems to work with factors too. Nevertheless, it is meaningless to have date/time information stored as factors.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top