Question

I am trying to write a generic replace function for a regex sub operation in Python (trying in both 2 and 3) Where the user can provide a regex pattern and a replacement for the match. This could be just a simple string replacement to replacing using the groups from the match.

In the end, I get from the user a dictionary in this form:

regex_dict = {pattern:replacement}

When I try to replace all the occurrences of a pattern via this command, the replacement works for replacements for a group number, (such as \1) and I call the following operation:

re.sub(pattern, regex_dict[pattern], text)

This works as expected, but I need to do additional stuff when a match is found. Basically, what I try to achieve is as follows:

replace_function(matchobj):
    result = regex_dict[matchobj.re]
    ##
    ## Do some other things
    ##
    return result

re.sub(pattern, replace_function, text)

I see that this works for normal replacements, but the re.sub does not use the group information to get the match when the function is used.

I also tried to convert the \1 pattern to \g<1>, hoping that the re.sub would understand it, but to no avail.

Am I missing something vital?

Thanks in advance!

Additional notes: I compile the pattern using strings as in bytes, and the replacements are also in bytes. I have non-Latin characters in my pattern, but I read everything in bytes, including the text where the regex substitution will operate on.

EDIT Just to clarify, I do not know in advance what kind of replacement the user will provide. It could be some combination of normal strings and groups, or just a string replacement.

SOLUTION

replace_function(matchobj):
    repl = regex_dict[matchobj.re]
    ##
    ## Do some other things
    ##
    return matchobj.expand(repl)

re.sub(pattern, replace_function, text)
Was it helpful?

Solution

I suspect you're after .expand, if you've got a compiled regex object (for instance), you can provide a string to be taken into consideration for the replacements, eg:

import re

text = 'abc'
# This would be your key in the dict
rx = re.compile('a(\w)c') 
# This would be the value for the key (the replacement string, eg: `\1\1\1`)
res = rx.match(text).expand(r'\1\1\1') 
# bbb
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top