"invalid regular expression...reason 'Trailing backslash''' error with gsub in R

StackOverflow https://stackoverflow.com/questions/22755876

  •  24-06-2023
  •  | 
  •  

Question

I am getting error message while replacing text in R.

 x
 [1] "Easy bruising and bleeding.\\"

gsub(as.character(x), "\\", "")
Error in gsub(as.character(x), "\\", "") : 
   invalid regular expression 'Easy bruising and bleeding.\', reason 'Trailing backslash'
Was it helpful?

Solution

The arguments are in the wrong order. Study help("gsub").

gsub( "\\", "", "Easy bruising and bleeding.\\", fixed=TRUE)
#[1] "Easy bruising and bleeding."

OTHER TIPS

tl;dr: You need 4 \s (i.e. \\\\) in the first argument of gsub in order to find one literal \ in the third argument of gsub. The overall process is:

  • gsub receives \\\\, passes \\
  • regex receives \\, searches \.

To avoid fixed = TRUE, which precludes doing more complex searches, your code should be:

> gsub( "\\\\", "", "Easy bruising and bleeding.\\")
[1] "Easy bruising and bleeding."

Explanation: The reason you need 4 \ is that \ is a special character for the regex engine, so in order for the regex engine to find a literal \ it needs to be passed \\; the first \ indicates that the second \ is not a special character but a \ that should be matched literally. Thus regex receives \\ and searches for \ in the string.

\ is also a special character for R, so in order for gsub to pass \\ to the regex engine, gsub needs to be receive \\\\. The first \ indicates that the second \ is a literal \ and not a special character; the third \ does the same thing for the fourth \. Thus gsub receives \\\\ and passes \\ to the regex engine.

Again, the overall process is: gsub receives \\\\, passes \\; regex receives \\, searches \.

Note: while the string that you gave us prints to the screen as "Easy bruising and bleeding.\\", the string is actually Easy bruising and bleeding.\. The first \ is actually just an escape for the second \. You can verify this with this code:

> cat("Easy bruising and bleeding.\\")
Easy bruising and bleeding.\

That's why the code I suggest has 4 \s and not 8 \s.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top