Question

My regex is below:

(?<![\s]*?(\"|&quot;)")WORD(?![\s]*?(\"|&quot;))

As you can see, I am trying to match all instances of WORD unless they are inside "quotation marks". So...

WORD <- Find this
"WORD" <- Don't find this
"   WORD   " <- Also don't find this, even though not touching against marks
&quot;WORD&quot;  <- Dont find this (I check &quot; and " so works after htmlspecialchars)

I beleive my regex would work perfectly IF I did not receive the error:

Compilation failed: lookbehind assertion is not fixed length

Is there any way to do what I intend, considering the limitations of lookbehind?

If you can think of any other way let me know.

Many many thanks,

Matthew

p.s. The WORD section will actually contain Jon Grubers URL detector

Was it helpful?

Solution

I would suggest a different approach. This will work as long as the quotes are correctly balanced, because then you know you're inside a quoted string iff the number of quotes that follow is odd, thereby making the lookbehind part unnecessary:

if (preg_match(
'/WORD             # Match WORD
(?!                # unless it\'s possible to match the following here:
 (?:               # a string of characters
  (?!&quot;)       # that contains neither &quot;
  [^"]             # nor "
 )*                # (any length),
 ("|&quot;)        # followed by either " or &quot; (remember which in \1)
 (?:               # Then match
  (?:(?!\1).)*\1   # any string except our quote char(s), followed by that quote char(s)
  (?:(?!\1).)*\1   # twice,
 )*                # repeated any number of times --> even number
 (?:(?!\1).)*      # followed only by strings that don\'t contain our quote char(s)
 $                 # until the end of the string
)                  # End of lookahead/sx', 
$subject))

OTHER TIPS

I would suggest removing quoted strings, then searching through what remains.

$noSubs = preg_replace('/(["\']|&quot;)(\\\\\1|(?!\1).)*\1/', '', $target);
$n = preg_match_all('/\bWORD\b/', $noSubs, $matches);

The regex I used to replace quoted strings above treats &quote;, " and ' as separate string delimiters. For any given delimiter, your regex looks more like this:

/"(\\"|[^"])*"/

So, if you want to treate &quot; as equivalent to ":

/("|&quot;)(\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)/i

If you then want to also handle single quoted strings (assuming there are no words with apostrophes):

/("|&quot;)(\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)|'(\\'|[^'])*'/i

Careful when escaping these to be put in to PHP strings.

EDIT

Qtax mentioned you may be trying to replace the matched WORD data. In that case, you could easily tokenize the string with a regex like this one:

/("|&quot;)(\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)|((?!"|&quot;).)+/i

Into quoted strings and unquoted segments, then build a new string with your replace operating on only the unquoted sections:

$tokenizer = '/("|&quot;)(\\\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)|((?!"|&quot;).)+/i';
$hasQuote = '/"|&quot;/i';
$word = '/\bWORD\b/';
$replacement = 'REPLACEMENT';
$n = preg_match_all($tokenizer, $target, $matches, PREG_SET_ORDER);
$newStr = '';
if ($n === false) {
    /* Print error Message */
    die();
}
foreach($matches as $match){
    if(preg_match($hasQuote, $match[0])){
        //If it has a quote, it's a quoted string.
        $newStr .= $match[0];
    } else {
        //Otherwise, run the replace.
        $newStr .= preg_replace($word, $replacement, $match[0]);
    }
}

//Now $newStr has your replaced String.  Return it from your function, or print it to
//your page.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top