Regex pattern for illegal regex groups `\g<...>`

Question 1

Thanks to all the comments, I have this solution.

# Good uses.
p = re.compile(r"(?:[^\\])\\g<(\w+)>")

for m in p.finditer(r"</\g\<at__tribut1>\\g<notattribut>>"):
    print(m.group(1))

# Bad uses.
p = re.compile(r"(?:[^\\])\\g(?!<\w+>)")

if p.search(r"</\g\<at__tribut1>\\g<notattribut>>"):
    print("Wrong use !")

Question 2

As far as I am aware, the only restriction on named capture groups is that you can't put metacharacters in there, such as . \, etc...

Have you come across some kind of problem with named capture groups?

The regex you used, r"illegal|(\g<NAME>\w+)" is only illegal because you referred to a backreference without it being declared earlier in the regex string. If you want to make a named capture group, it is (?P<NAME>regex)

Like this:

>>> import re
>>> string = "sup bro"
>>> re.sub(r"(?P<greeting>sup) bro", r"\g<greeting> mate", string)
'sup mate'

If you wanted to do some kind of analysis on the actual regex string in use, I don't think there is anything inside the re module which can do this natively.

You would need to run another match on the string itself, so, you would put the regex into a string variable and then match something like \(\?P<(.*?)>\) which would give you the named capture group's name.

I hope that is what you are asking for... Let me know.

Question 3

So, what you want is to get the string of the group name, right?

Maybe you can get it by doing this:

>>> regex = re.compile(r"illegal|(?P<group_name>\w+)")
>>> regex.groupindex
{'group_name': 1}

As you see, groupindex returns a dictionary mapping the group names and their position in the regex. Having that, it is easy to retrieve the string:

>>> # A list of the group names in your regex:
... regex.groupindex.keys()
['group_name']

>>> # The string of your group name:
... regex.groupindex.keys()[0]
'group_name'

Don't know if that is what you were looking for...

Question 4

Use a negative lookahead?

\\g(?!<\w+>)

This search for any g not followed by <…>, thus a "wrong use".