Domanda

how can you isolate the year from a vector with dates? or in general: how can you isolate part of a word (here, the last four letters of a word)?

date <- c("05.06.2001","02.10.2003","06.12.2004","01.01.2001","01.04.2003")
company <- c(1,1,1,2,2)

mydf <- data.frame(date, company)
mydf

#         date company
# 1 05.06.2001       1
# 2 02.10.2003       1
# 3 06.12.2004       1
# 4 01.01.2001       2
# 5 01.04.2003       2

the output should look like this:

#         date company year
# 1 05.06.2001       1 2001
# 2 02.10.2003       1 2003
# 3 06.12.2004       1 2004
# 4 01.01.2001       2 2001
# 5 01.04.2003       2 2003

I've tried to use lubridate package

require(lubridate)
mydf$year <- year(mydf$date)

but, i want to be able to do it in general, not with a package that works only for dates.

È stato utile?

Soluzione

Here are two approaches, one date based and one character based:

with(mydf, substr(date, nchar(as.character(date)) - 3, 
                  nchar(as.character(date))))
# [1] "2001" "2003" "2004" "2001" "2003"

format(as.Date(mydf$date, "%d.%m.%Y"), "%Y")
# [1] "2001" "2003" "2004" "2001" "2003"

nchar is overkill in this case since the strings are a fixed width, but this should give you an example of how to go from the end of the string backwards 4 characters.

Altri suggerimenti

Use stri_sub from stringi package to get last 4 characters like this:

require(stringi)    
stri_sub(mydf$date, from=-4)
## [1] "2001" "2003" "2004" "2001" "2003"

negative value used in from parameter means that characters should be counted from the end of a string. Default value for to parameter is -1 which means 'till the end', so there is no need to change this.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top