String rearrangement in R

https://stackoverflow.com/questions/20904539

23-09-2022
|

Pregunta

I am on the lookout for two R functions that would perform the following string rearrangements: (1) place the parts following a ", " in a string at the start of a string, e.g.

name="2,6-Octadien-1-ol, 3,7-dimethyl-, (E)-"

should yield

"(E)-3,7-dimethyl-2,6-Octadien-1-ol"

(note that there could be any number of ", " in a string, or none at all, and that the parts after the ", " should be placed at the start of the string successively, starting from the end of the string. What would be the most efficient way of achieving this in R (without using loops etc)?

(2) place the parts between "<" and ">" at the start of a string and remove any ", ". E.g.

name="Pyrazine <2-acetyl-, 3-ethyl->"

should yield

"2-acetyl-3-ethyl-Pyrazine"

(this is a simpler gsub problem, right?) The part between the "<" and ">" could be in any place in the string though. E.g.

name="Cyclohexanol <4-tertbutyl-> acetate" should yield

"4-tertbutyl-Cyclohexanol acetate"

Any thoughts would be welcome!

cheers, Tom

Solución

For the first problem:

name <- c("2,6-Octadien-1-ol, 3,7-dimethyl-, (E)-",
  "2,6-Octadien-1-ol,3,7-dimethyl-,(E)-")

sapply(strsplit(name, "(?<!\\d), ?", perl = TRUE), function(x) 
  paste(rev(x), collapse = ""))
# [1] "(E)-3,7-dimethyl-2,6-Octadien-1-ol" "(E)-3,7-dimethyl-2,6-Octadien-1-ol"

For the second problem:

name <- c("Pyrazine <2-acetyl-, 3-ethyl->", 
          "Cyclohexanol <4-tertbutyl-> acetate")

inside <- gsub(", ", "", sub("^.*<(.+)>.*$", "\\1", name))
outside <- sub("^(.*) <.*>(.*)$" , "\\1\\2", name)
paste0(inside, outside)
# [1] "2-acetyl-3-ethyl-Pyrazine"        "4-tertbutyl-Cyclohexanol acetate"

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow