How to use a negative lookbehind

https://stackoverflow.com/questions/7051749

26-12-2020
|

Question

Basically, I am changing any and all hexadecimal values with a blue hue to its red hue counterpart in a given stylesheet (i.e. #00f is changed to #ff0000 (my function outputs six character hexadecimal values excluding the #)).

It was not a problem creating a regular expression to match hexadecimal colors (I'm not concerned about HTML color names although I may eventually care about rgb, rgba, hsb, etc. values.). This is what I ended up with #(([0-9A-z]{3}){1,2}). It works but I want it to be full proof. For example, if somebody happens to set a background image with a fragment (i.e. #top) with a valid hexadecimal value, I don't want to change it. I tried doing a negative lookbehind, but it doesn't seem to work. I was using \B#(([0-9A-z]{3}){1,2}) but if there is a word boundary (such as a space) before the '#', it match the URL fragment. This is what I thought should do the trick but doesn't: (?<!url\([^#)]*)#(([0-9A-z]{3}){1,2}).

I am using the desktop version of RegExr to test with the following stylesheet:

body {
    background: #f09 url('images#06F');
}
span {
    background=#00f url('images#889');
}
div {
    background:#E4aaa0 url('images#889');
}
h1 {
    background: #fff #dddddd;
}

Whenever, I hover over the (?<! substring, RegExr identifies it as a "Negative lookahead matching 'url\([^#)]*'." Could there be a bug or am I just having a bad regex day? And while we're at it, are there any other contextes in which a '#' is used for non-hexadecimal purposes?

EDIT: Alright, I can't program early in the morning. That hexadecimal regex should be #(([0-9A-Fa-f]{3}){1,2})

EDIT 2: Alright, so I missed the detail that most languages require static length lookbehinds.

Solution

I think that what you need is either one of the following solutions or the other

ss = '''    background: #f09 url('images#06F'); 
    background=#00f url('images #889'); 
    background:#E4aaa0 url('images#890'); 
    background: #fff #dddddd; '''

print ss
import re

three = '(?:[0-9A-Fa-f]{3})'

regx = re.compile('^ *background[ =:]*#(%s{1,2})' % three,re.MULTILINE)
print regx.findall(ss)

print '-----------------------------------------------------'

regx = re.compile('(?:(?:^ *background[ =:]*)|(?:(?<=#%s)|(?<=#%s%s)) +)'
                  '#(%s{1,2})' % (three,three,three,three),
                  re.MULTILINE)
print regx.findall(ss)

result

    background: #f09 url('images#06F'); 
    background=#00f url('images #889'); 
    background:#E4aaa0 url('images#890'); 
    background: #fff #dddddd; 
['f09', '00f', 'E4aaa0', 'fff']
-----------------------------------------------------
['f09', '00f', 'E4aaa0', 'fff', 'dddddd']

Edit 1

ss = '''    background: #f09 url('images#06F'); 
    background=#00f url('images #889'); 
    color:#E4aaa0 url('images#890'); 
    background: #fff #dddddd#125e88    #ae3;
    Walter (Elias) Disney: #f51f51 '''

print ss+'\n'

import re

three = '(?:[0-9A-Fa-f]{3})'

regx = re.compile('^ *[^=:]+[ =:]*#(%s{1,2})' % three,re.MULTILINE)
print regx.findall(ss)

print '-----------------------------------------------------'

regx = re.compile('(?:(?:^ *[^=:]+[ =:]*)|(?:(?<=#%s)|(?<=#%s%s)) *)'
                  '#(%s{1,2})' % (three,three,three,three),
                  re.MULTILINE)
print regx.findall(ss)

result

    background: #f09 url('images#06F'); 
    background=#00f url('images #889'); 
    color:#E4aaa0 url('images#890'); 
    background: #fff #dddddd#125e88    #ae3;
    Walter (Elias) Disney: #f51f51 

['f09', '00f', 'E4aaa0', 'fff', 'f51f51']
-----------------------------------------------------
['f09', '00f', 'E4aaa0', 'fff', 'dddddd', '125e88', 'ae3', 'f51f51']

Edit 2

ss = '''    background: #f09 url('images#06F'); 
    background=#00f url('images #889'); 
    color:#E4aaa0 url('images#890'); 
    background: #fff #dddddd#125e88    #ae3;
    Walter (Elias) Disney: #f51f51
    background: -webkit-gradient(linear, from(#000000), to(#ffffff));. '''

print ss+'\n'

import re

three = '(?:[0-9A-Fa-f]{3})'

preceding = ('(?:(?:^[^#]*)'
                 '|'
                 '(?:(?<=#%s)'
                     '|'
                     '(?<=#%s%s)'
                     '|'
                     '(?<= to\()'
                     ')'
                 ')') % (three,three,three)

regx = re.compile('%s *#(%s{1,2})' % (preceding,three), re.MULTILINE)
print regx.findall(ss)

result

    background: #f09 url('images#06F'); 
    background=#00f url('images #889'); 
    color:#E4aaa0 url('images#890'); 
    background: #fff #dddddd#125e88    #ae3;
    Walter (Elias) Disney: #f51f51
    background: -webkit-gradient(linear, from(#000000), to(#ffffff));. 

['f09', '00f', 'E4aaa0', 'fff', 'dddddd', '125e88', 'ae3', 'f51f51', '000000', 'ffffff']

Regexes are extremely powerful in the condition that there must be enough portions of strings following a certain organisation having relative stability among variable other portions that are intended to be catched. If the analyzed text becomes too loose in its structure, it becomes impossible to write a regex.

Are there still a lot of other "Harlequin-like patchwork" structures possible for your strings ??

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow