Question

I have the following string:

OPEN someone said hello CLOSE im saying hello people OPEN some said hello OPEN they said hello again CLOSE i have to go now though CLOSE hello again!

I'm trying to match all occurences of hello (that are not enclosed in the OPEN and CLOSE words) and replace them with another word, possibly with a regex and PHP's preg_replace function (although I'm open to other methods as I can't think of any).

So from the above string the below will match (I've placed them in brackets with italics to help you distinguish):

OPEN someone said hello CLOSE im saying (hello) people OPEN some said hello OPEN they said hello again CLOSE i have to go now though CLOSE (hello) again!

Not really sure how to go by doing this.

Edit perhaps this will clarify the nesting structure abit better:

OPEN
text
CLOSE

OPEN 
text
  OPEN
   text
  CLOSE
text
CLOSE

As you can see from above, the hello is not being notice because its within OPEN...CLOSE (so they are ignored) whereas the others which arent are going to be replaced.

Was it helpful?

Solution

Alan's answer works great. However, since I already took the time to compose it, here is another way to do it using a callback function and the PHP (?R) recursive expression:

function highlightNonNestedHello($str) {
    $re = '/# Two global alternatives. Either...
          (                          # $1: Non-O..C stuff.
            (?:                      # Step through non-O..C chars.
              (?!\b(?:OPEN|CLOSE)\b) # If not start of OPEN or CLOSE,
              .                      # then match next char.
            )+                       # One or more non-O..C chars.
          )                          # End $1:
        |                            # Or...
          (                          # $2: O..C stuff.
            \bOPEN\b                 # Open literal delimiter.
            (?R)+                    # Recurse overall regex.
            \bCLOSE\b                # Close literal delimiter.
          )                          # End $1:
    /sx';
    return preg_replace_callback($re, '_highlightNonNestedHello_cb', $str);
}
function _highlightNonNestedHello_cb($matches) {
    // Case 1: Non-O...C stuff. Highlight all "hello".
    if ($matches[1]) {
        return preg_replace('/\bhello\b/', '(HELLO)', $matches[1]);
    }
    // Case 2: O...C stuff. Preserve as-is.
    return $matches[2];
}

OTHER TIPS

I numbered the hellos, so hello2 and hello5 are the ones that should get replaced.

$s0 = 'OPEN someone said hello1 CLOSE im saying hello2 people OPEN some said hello3 OPEN they said hello4 again CLOSE i have to go now though CLOSE hello5 again!';

$regex='~
hello\d
(?=
  (?:(?!OPEN|CLOSE).)*+
  (?:
    ( 
      OPEN
      (?:
        (?:(?!OPEN|CLOSE).)*+
        |
        (?1)
      )*
      CLOSE
    )
    (?:(?!OPEN|CLOSE).)*+
  )?
  $
)
~x';

$s1=preg_replace($regex, 'goodbye', $s0);
print($s1);

output:

OPEN someone said hello1 CLOSE im saying goodbye people OPEN some said hello3 OPEN they said hello4 again CLOSE i have to go now though CLOSE goodbye again!

demo

The lookahead uses the recursive subpattern construct, (?1) to try and match zero or more complete, nested OPEN...CLOSE structures between the currently-matched word and the end of the string. Assuming all the OPENs and CLOSEs are properly balanced, that means the hello\d it just matched is not inside such a structure.

Well this is my attempt, tell me if it works for you or not:

<?php

$str = 'OPEN someone said hello CLOSE im saying hello people OPEN some said hello OPEN they said hello again CLOSE i have to go now though CLOSE hello again!';
echo "<p>$str</p>"; //before

//first replace all of them
$str = str_replace('hello', '(hello)', $str);
//then replace back only those within OPEN CLOSE
function replace_back($match){return str_replace('(hello)', 'hello', $match[0]);}
$str = preg_replace_callback('/OPEN.*?\(hello\).*?CLOSE/', 'replace_back', $str); 

echo "<p>$str</p>"; //after

?>
<style>p{width:500px;background:#F1F1F1;padding:10px;font:13px Arial;}</style>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top