preg_match() with htmlentities() - skip backslash before any quote ending on matching quote - not greedy - including new line

StackOverflow https://stackoverflow.com/questions/20940483

  •  24-09-2022
  •  | 
  •  

Question

What I'm try to do is preg_replace() anything inside htmlentities() escaped quotes that are in any String. I don't want it to be so greedy that if I have multiple quotes inside the String it will replace the entire thing, just from one quote style to itself including backslashed quotes of the same kind.

Experts only please:

$r = '"first
quote set begin capture for replacement

  \"these escaped quotes should be included for replacement\"

first quote set - end first capture for replacement here"

more stuff - should not be captured
\'second quote set begin capture for replacement

  \\\'these escaped quotes should be included for replacement\\\'

second quote set - end second capture for replacement here\'
`this would also be captured \` `
" this should be separate from first replacement "';
$strA = array('`', "'", '"');
foreach($strA as $v){
  $ste[] = htmlentities($v, ENT_QUOTES, 'UTF-8');
}
$r = preg_replace('/(('.implode('|', $ste).').*(\\\2)*.*\2)/Us', "<span class='sE'>$1</span>", $r);

Of course, the above pattern does not work, but shows concept. $r should end up in <pre> tags like:

<span class='sE'>&quot;first
quote set begin capture for replacement

  \&quot;these escaped quotes should be included for replacement\&quot;

first quote set - end first capture for replacement here&quot;</span>

more stuff - should not be captured
<span class='sE'>&#039;second quote set begin capture for replacement

  \&#039;these escaped quotes should be included for replacement\&#039;

second quote set - end second capture for replacement here&#039;</span>
<span class='sE'>`this would also be captured \` `</span>
<span class='sE'>&quot; this should be separate from first replacement &quot;</span>

Any help would be appreciated.

Was it helpful?

Solution 2

I figured it out on my own, I think:

$strA = array('`', "'", '"');
foreach($strA as $v){
  $ste[] = htmlentities($v, ENT_QUOTES, 'UTF-8');
}
$r = preg_replace('/((?<!\\\\)('.implode('|', $ste).').*(?<!\\\\)\2)/Us', "<span class='sE'>$1</span>", $r);

I still have to do a bunch of testing but I think this works.

OTHER TIPS

You can use this (to illustrate Jack's idea):

$pattern = <<<'LOD'
~
    (['"`])
    (?> [^`"'\\]++ | \\{2} | \\. | (?!\1)["'`] )*
    \1
~xs
LOD;
$result = preg_replace_callback($pattern, function($m) {
    return '<span class="sE">'
         . str_replace(array('"', "'"), array('&quot;', '&#039;'), $m[0])
         . '</span>';
   }, $r);

An other way is to perform the quotes replacement first and to split after:

$pattern = <<<'LOD'
~
    (&(?>quot|039);|`)
    (?> [^&`\\]++ | \\{2} | \\. | (?!\1)[&`] )*
    \1
~xs
LOD;
$result = preg_replace($pattern,
                  '<span class="sE">$0</span>',
                  str_replace(array('"', "'"), array('&quot;', '&#039;'), $r));

You can use htmlentities instead of str_replace in the two examples.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top