Pregunta

I am on the lookout for two R functions that would perform the following string rearrangements: (1) place the parts following a ", " in a string at the start of a string, e.g.

name="2,6-Octadien-1-ol, 3,7-dimethyl-, (E)-"

should yield

"(E)-3,7-dimethyl-2,6-Octadien-1-ol"

(note that there could be any number of ", " in a string, or none at all, and that the parts after the ", " should be placed at the start of the string successively, starting from the end of the string. What would be the most efficient way of achieving this in R (without using loops etc)?

(2) place the parts between "<" and ">" at the start of a string and remove any ", ". E.g.

name="Pyrazine <2-acetyl-, 3-ethyl->"

should yield

"2-acetyl-3-ethyl-Pyrazine"

(this is a simpler gsub problem, right?) The part between the "<" and ">" could be in any place in the string though. E.g.

name="Cyclohexanol <4-tertbutyl-> acetate" should yield

"4-tertbutyl-Cyclohexanol acetate"

Any thoughts would be welcome!

cheers, Tom

¿Fue útil?

Solución

  1. For the first problem:

    name <- c("2,6-Octadien-1-ol, 3,7-dimethyl-, (E)-",
      "2,6-Octadien-1-ol,3,7-dimethyl-,(E)-")
    
    sapply(strsplit(name, "(?<!\\d), ?", perl = TRUE), function(x) 
      paste(rev(x), collapse = ""))
    # [1] "(E)-3,7-dimethyl-2,6-Octadien-1-ol" "(E)-3,7-dimethyl-2,6-Octadien-1-ol"
    
  2. For the second problem:

    name <- c("Pyrazine <2-acetyl-, 3-ethyl->", 
              "Cyclohexanol <4-tertbutyl-> acetate")
    
    inside <- gsub(", ", "", sub("^.*<(.+)>.*$", "\\1", name))
    outside <- sub("^(.*) <.*>(.*)$" , "\\1\\2", name)
    paste0(inside, outside)
    # [1] "2-acetyl-3-ethyl-Pyrazine"        "4-tertbutyl-Cyclohexanol acetate"
    
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top