Question

I want to replace newlines in just a part of a string. Suppose I have the following:

foo bar __level [
$save = 123,
Info = '1234'
]
{Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut 
labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur.}

I want to replace that to this:

foo bar __level [$save = 123,Info = '1234']
{Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut 
labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur.}

So basically the newlines should be removed till it sees a { character. The rest should just keep its newlines.

I know I can replace all newlines with a preg_replace with \s+. But I dont know how to do it in this case since i just need to replace it for a small part in the string.

So how can this be done with a preg_replace?

Was it helpful?

Solution

Assuming that all square brackets are balanced and not nested, you can use this code:

$pattern = '~(?:\[|(?!\A)\G)[^]\r\n]*\K\R+~';

$txt = preg_replace($pattern, '', $txt);

pattern details:

(?:           # open a non capturing group
    \[        # a literal opening square bracket
  |           # or
    (?!\A)\G  # the position in the string after the last match
)             # close the non capturing group
[^]\r\n]*     # zero or more characters that are not ] or CR or LF
\K            # resets all from match result
\R+           # any type of newline one or more times

The pattern above assumes that there is always a closing square bracket, if the closing square bracket is missing, all the text after the opening square bracket is processed until the end of the string.

If you want to change this behavior, you must add a lookahead assertion to check the presence of the closing square bracket (but note that this makes the pattern slower):

(?:\[|(?!\A)\G)[^]\r\n]*\K\R+(?=[^]]*])

About \G:
This is an anchor (as ^ $ \A \z are) that represents the position in the string after the last match, however since there is no last match at the start, \G is set to the start of the string (\A or ^). To avoid this case, a way is to add a negative lookahead or lookbehind after or before \G (This is exactly the same since you are dealing with zero-width assertions): (?!\A)


If you don't care about square brackets and only want to skip content between curly brackets, you can do this:

$pattern = '~(\R?\h*{[^}]*})|\R+~';

$txt = preg_replace($pattern, '$1', $txt); 

where curly brackets parts (with the leading newline as in you example) are replaced by themselve or this:

$pattern = '~\R?\h*{[^}]*}(*SKIP)(*FAIL)|\R+~';

$txt = preg_replace($pattern, '', $txt);

where the same parts are skipped because the subpattern is forced to fail with (*FAIL) and (*SKIP) forbids to retry a subpattern at the same position (when the subpattern fails).

OTHER TIPS

Dunno if it's more efficient than Casimir's regex but here's an alternate method that's perhaps a little easier to swallow:

$content = <<<'EOC'
foo bar __level [
$save = 123,
Info = '1234'
]
{Lorem ipsum dolor sit 
amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut 
labore et dolore magna aliqua. 
Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip 
ex ea commodo consequat. Duis aute irure dolor in 
reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur.}
EOC;

$content = preg_replace_callback(
  '~^([^{]*)~',
  function ($m) {
    return str_replace(array("\r","\n"),'',$m[1]);
  },
  $content
);

echo "<pre>".$content;

output:

foo bar __level [$save = 123,Info = '1234']{Lorem ipsum dolor sit 
amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut 
labore et dolore magna aliqua. 
Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip 
ex ea commodo consequat. Duis aute irure dolor in 
reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur.}

Simple pattern:

(?=\R)\R+(?=.*\R{)

Explanations:

(?=         # a Positive Lookahead
    \R      # for a new line
)           # Lookahead end
    \R+     # match the new line(s)
(?=         # another Positive Lookahead
    .*      # match every character until
    \R      # another new line
    {       # followed by a curly bracket
)           # Lookahead end

Using:

$string = preg_replace("/(?=\R)\R+(?=.*\R{)/s", "", $string);

Live regex demo

Live PHP demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top