Question

I'd like to split a regex onto multiple lines for clarity, but I'm not sure what the best way to do this is with raw strings.

SECT_EXP = (
    r'^(?P<number>.+?[.]? {1,2}'  # Begin number pattern match
    r'(?P<sect_num>'  # Begin section number match
    r'(?P<full_num>'  # Begin full number match
    r'(?P<title>\d{1,2}?)'  # Match title substring
    r'(?P<chapter>\d{2})'  # Match chapter substring
    r')'  # End full number match
    r'[.]'
    r'(?P<section>\d+)'  # Match section substring
    r')'  # End section number match
    r')'  # End number pattern match
    r'([.]?)[ ]*$'  # Lazy matching end of strings
)

But do I need to prefix each string with r to make sure that the whole thing is processed as a raw string when implicit line joining is utilized?

Was it helpful?

Solution

From this page:

re.X
re.VERBOSE

This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.

That means that the two following regular expression objects that match a decimal number are functionally equal:

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

b = re.compile(r"\d+\.\d*")

As you can see, it is possible to use a triple-quoted string with the 'r' prefix, as seen above.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top