Question

I'm searching for a function in PHP to put every paragraph element like <p>, <ul> and <ol> into an array. So that i can manipulate the paragraph, like displayen the first two paragraphs and hiding the others.

This function does the trick for the p-element. How can i adjust the regexp to also match the ul and ol? My tryout gives an error: complaining the < is not an operator...

function aantalP($in){
    preg_match_all("|<p>(.*)</p>|U",
        $in,
        $out, PREG_PATTERN_ORDER);
    return $out;
}

//tryout:
    function aantalPT($in){
        preg_match_all("|(<p> | <ol>)(.*)(</p>|</o>)|U",
            $in,
            $out, PREG_PATTERN_ORDER);
        return $out;
    }

Can anyone help me?

Was it helpful?

Solution

You can't do this reliably with regular expressions. Paragraphs are mostly OK because they're not nested generally (although they can be). Lists however are routinely nested and that's one area where regular expressions fall down.

PHP has multiple ways of parsing HTML and retrieving selected elements. Just use one of those. It'll be far more robust.

Start with Parse HTML With PHP And DOM.

If you really want to go down the regex route, start with:

function aantalPT($in){
  preg_match_all('!<(p|ol)>(.*)</\1>!Us', $in, $out);
  return $out;
}

Note: PREG_PATTERN_ORDER is not required as it is the default value.

Basically, use a backreference to find the matching tag. That will fail for many reasons such as nested lists and paragraphs nested within lists. And no, those problems are not solvable (reliably) with regular expressions.

Edit: as (correctly) pointed out, the regex is also flawed in that it used a pipe delimeter and you were using a pipe character in your regex. I generally use ! as that doesn't normally occur in the pattern (not in my patterns anyway). Some use forward slashes but they appear in this pattern too. Tilde (~) is another reasonably common choice.

OTHER TIPS

  • First of all, you use | as delimiter to mark the beginning and end of the regular expression. But you also use | as the or sign. I suggest you replace the first and last | with #.
  • Secondly, you should use backreferences with capture of the start and end tag like such: <(p|ul)>(.*?)</\1>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top