regex to remove words if it contains a letter/special character multiple times simultaneously in R

StackOverflow https://stackoverflow.com/questions/22888528

  •  28-06-2023
  •  | 
  •  

Question

I want to remove those words where the number of letters/special characters in a word occurs more than twice simultaneously.

For Eg the input is like

"Google in theee lland of whhhat c#, c++ and e###"

and the output should be

"Google in lland of c#, c++ and"
Was it helpful?

Solution

x <- "Google in theee lland of whhhat c#, c++ and e###"
gsub("\\S*(\\S)\\1\\1\\S*\\s?", "", x)
# [1] "Google in lland of c#, c++ and "

(\\S)\\1\\1 finds sequences of three consecutive repetitions of a single non-space character.

The surrounding \\S* and \\S*\\s? just capture preceding and succeeding characters within the same word, as well as any single space immediately following the word.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top