Question

I have the following regex pattern:

pattern = r'''
        (?P<name>.+?)\n
        SKU\s#\s+(?P<sku_hidden>\d+)\n
        Quantity:\s+(?P<quantity>\d+)\n
        Gift\sWrap:\s+(?P<gift_wrap>.+?)\n
        Shipping\sMethod:.+?\n
        Price:.+?\n
        Total:\s+(?P<total_price>\$[\d.]+)
        '''  

I retrieve them using:

re.finditer(pattern, plain, re.M | re.X)

Yet using re.findall yields the same result.

It should match texts like this:

Red Retro Citrus Juicer
SKU # 403109
Quantity: 1
Gift Wrap: No
Shipping Method:Standard
Price: $24.99
Total: $24.99

The first thing that is happening is that using re.M and re.X it doesn't work, but if I put it all in one line it does. The other thing is that when it does work only the first group is caught and the rest ignored. Any thoughts?

ADDITIONAL INFORMATION:

If I change my pattern to be just:

pattern = r'''
        (?P<name>.+?)\n
        SKU\s#\s+(?P<sku_hidden>\d+)\n
        '''

My output comes out like this: [u'Red Retro Citrus Juicer'] it matches yet the SKU does not appear. If I put everything on the same line, like so:

pattern = r'(?P<name>.+?)\nSKU\s#\s+(?P<sku_hidden>\d+)\n' 

It does match and grab everything.

Was it helpful?

Solution

When using the X flag, you need to escape the #, which start the comments.

Right now your two-line regex is equivalent to

(?P<name>.+?)\n
SKU\s

What you want is

pattern = r'''
    (?P<name>.+?)\n
    SKU\s\#\s+(?P<sku_hidden>\d+)\n
    Quantity:\s+(?P<quantity>\d+)\n
    Gift\sWrap:\s+(?P<gift_wrap>.+?)\n
    Shipping\sMethod:.+?\n
    Price:.+?\n
    Total:\s+(?P<total_price>\$[\d.]+)
    '''  

Notice the \#...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top