To make up for Python's re
regex engine's lack of some Perl abilities, you can use a lambda expression in a re.sub
function to create a dynamic replacement:
import re
string = "It was never going to work, he thought. He did not play so well, so he had to practice some more. Not foobar !"
transformed = re.sub(r'\b(?:not|never|no)\b[\w\s]+[^\w\s]',
lambda match: re.sub(r'(\s+)(\w+)', r'\1NEG_\2', match.group(0)),
string,
flags=re.IGNORECASE)
Will print (demo here)
It was never NEG_going NEG_to NEG_work, he thought. He did not NEG_play NEG_so NEG_well, so he had to practice some more. Not NEG_foobar !
Explanation
The first step is to select the parts of your string you're interested in. This is done with
\b(?:not|never|no)\b[\w\s]+[^\w\s]
Your negative keyword (
\b
is a word boundary,(?:...)
a non capturing group), followed by alpahnum and spaces (\w
is[0-9a-zA-Z_]
,\s
is all kind of whitespaces), up until something that's neither an alphanum nor a space (acting as punctuation).Note that the punctuation is mandatory here, but you could safely remove
[^\w\s]
to match end of string as well.Now you're dealing with
never going to work,
kind of strings. Just select the words preceded by spaces with(\s+)(\w+)
And replace them with what you want
\1NEG_\2