문제

I have many filenames which look like:

txt= "MA0051_IRF2.xml"

I want to extract IRF2 which is between "_" and ".". How do I do this in R?

도움이 되었습니까?

해결책

To achieve this, you need a regexp that

  • matches an (optional) arbitrary string in front of the _ : .*
  • matches a literal _ : [_]
  • matches everything up to (but not including) the next . and stores it in capturing group no. 1 : ([^.]+)
  • matches a literal . : [.]
  • matches an (optional) arbitrary string after the . : .*

In your call to gsub, you then

  • use the regular expression we built in the previous step
  • replace the whole string with the contents of the first capturing group: \\1 (we need to escape the backslash, hence the double backslash)

Example:

gsub(".*[_]([^.]+)[.].*", "\\1", "MA0051_IRF2.xml")

다른 팁

an other possibility with the stringr package:

 str_extract(x, perl("(?<=_)(.+)(?=\\.)"))
gsub(".*_(.*)\\..*", "\\1", txt)
##"IRF2"

Here's a possible solution that doesn't require regex knowledge:

txt <- "MA0051_IRF2.xml"

library(qdap)
genXtract(txt, "_", ".")

## _  :  . 
##  "IRF2" 
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top