Extract with regex a dates that might or might not contain also time [duplicate]

StackOverflow https://stackoverflow.com/questions/20211852

  •  05-08-2022
  •  | 
  •  

Domanda

Please consider the following

library(stringr)

text <- c("blabla bla blabla bla 6:05, 15 July 2005, blabla bla", 
          "blabla bla bla 7:06, 3 November 2006, blabla bla",
          "blabla bla 24 November 2006, blabla bla",
          "blabla bla blabla bla bla blabla bla")

dates <- str_extract_all(text, ???)

I am trying to extract from the vector all dates, and in case they come with time also the time.

È stato utile?

Soluzione

Next time try to show what you have attempted. Following works but there may be more efficient regex pattern possible

pat <- paste0("([0-9]{1,2}:[0-9]{2}, )*[0-9]{1,2} (", paste(month.name, collapse = "|"), ") [0-9]{4}")

pat
## [1] "([0-9]{1,2}:[0-9]{2}, )*[0-9]{1,2} (January|February|March|April|May|June|July|August|September|October|November|December) [0-9]{4}"


regmatches(text, gregexpr(pat, text = text))
## [[1]]
## [1] "6:05, 15 July 2005"
## 
## [[2]]
## [1] "7:06, 3 November 2006"
## 
## [[3]]
## [1] "24 November 2006"
## 
## [[4]]
## character(0)
## 


# or using stringr package

str_extract_all(text, pat)
## [[1]]
## [1] "6:05, 15 July 2005"
## 
## [[2]]
## [1] "7:06, 3 November 2006"
## 
## [[3]]
## [1] "24 November 2006"
## 
## [[4]]
## character(0)
## 
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top