Question

This regex should match lists just like in Markdown:

/((?:(?:(?:^[\+\*\-] )(?:[^\r\n]+))(?:\r|\n?))+)/m

It works in Javascript (with g flag added) but I have problems porting it to PHP. It does not behave greedy. Here's my example code:

$string = preg_replace_callback('`((?:(?:(?:^\* )(?:[^\r\n]+))(?:\r|\n?))+)`m', array(&$this, 'bullet_list'), $string);

function bullet_list($matches) { var_dump($matches) }

When I feed to it a list of three lines it displays this:

array(2) { [0]=> string(6) "* one " [1]=> string(6) "* one " } array(2) { [0]=> string(6) "* two " [1]=> string(6) "* two " } array(2) { [0]=> string(8) "* three " [1]=> string(8) "* three " } 

Apparently var_dump is being called three times instead of just once as I expect from it since the regex is greedy and must match as many lines as possible. I have tested it on regex101.com. How do I make it work properly?

Was it helpful?

Solution 2

This regex won't work correctly if you have \r\n newlines in your input text.

The part (?:\r|\n?) matches either an \r or an \n, but not both. (regex101 treats newlines as \n only, so it works there).

Does the following work?

/(?:(?:(?:^[+*-] )(?:[^\r\n]+))[\r\n]*)+/m

(or, after removal of all the unnecessary non-capturing groups - thanks @M42!)

/(?:^[+*-] [^\r\n]+[\r\n]*)+/m

OTHER TIPS

Your regex can be reduced to:

(?:^[+*-] [^\r\n]+\R*)+

There're no needs to do all these groups.
\R means any kind of line break \n or \r or \r\n

Edit: \R looses its special meaning in a character class. [\R] means R
Thanks to HamZa

This will match all bulleted lines until it gets to the first line that is not bulleted.

(?<=^|\R)\*[\s\S]+?(?=$|\R[^*])
  • \* match a bullet where:
    • (?<=^|\R) it is preceeded by the start of the string or a newline.
  • [\s|S]+? match any character non-greedily where
    • (?=$|\R[^*]) the matched sequence is followed by the end of string or a new line character followed by a *. Essentially this means that the sequence match is complete when a non-bullet line is found or when end of string.

Results:

The resulting matches are shown in the RegexBuddy output below (Regex 101 can't handle it):

regex result

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top