<?php
$text = "1. dog
1. cat
1. fish
1. horse
1. duck
1. goose
1. swan
1. monkey
1. chimpanzee
1. orangutan
1. whale
1. pig
";
function callback($match) {
$out = preg_replace_callback("/(^($match[2] +)1\. .+(\\n|$))(?1)*/m", 'callback', $match[0]);
$out = preg_replace("/^$match[2]1\. (.+)$/m", "<li>$1</li>", $out);
return "<ol>\n$out</ol>\n";
}
$html = preg_replace_callback("/(^( *)1\. .+(\\n|$))(?1)*/m", 'callback', $text);
echo $html;
?>
Here's an ideone demo.
That's a pretty neat idea you had, using preg_replace_callback
recursively. Also, you're right about $
-strings not interpolating within double quotes unless they're a set variable; I always forget that. And, you were right to use /m
since you want ^
match the beginning of each line (not the beginning of the entire string) and you were also right to use (\n|$)
despite that $
matches the end of each line in /m
mode—because otherwise, the quantifier +
wouldn't work because $
wouldn't actually consume the \n
. I didn't see these facts when I first read your question.
Now, let's start with the first expression:
/(^( *)1\. .+(\\n|$))(?1)*/m
Actually, the recursive subexpression, (?1)
, isn't necessary except as shorthand. Let's expand that:
/(^( *)1\. .+(\\n|$))(^( *)1\. .+(\\n|$))*/m
| || |
+------------------++------------------+
So we have two identical halves. Why not just use +
as you did? Because I want to capture the number of spaces indenting the first line, only. Those spaces get stored in $match[2]
.
Within the callback, we bring those spaces back, plus one or more spaces:
/(^($match[2] +)1\. .+(\\n|$))(?1)*/m
That way, we only ever look at levels beneath the current level of indentation (more spaces), on each level of preg_replace_callback
recursion. And as the recursions unwind, only the lines indented by exactly that level's number of spaces, $match[2]
, are wrapped in <li></li>
,
/^$match[2]1\. (.+)$/m
before returning the whole wrapped in <ol></ol>
.