Regular Expression: Is there a way to tell preg_match_all to use the third match it finds skipping the first two?

StackOverflow https://stackoverflow.com/questions/1653032

  •  22-07-2019
  •  | 
  •  

Question

Is there a way to tell preg_match_all to use the third match it finds skipping the first two? For example, I have the following HTML

<div class="entry">
    <div class="text">BlaBlaBla</div>
    <div class="date">2009-10-31</div>
</div>

I need preg_match_all to get the contents of the outermost div, and not stop at the first /div it encounters.

Was it helpful?

Solution

This is the class of problem that regular expressions theoretically cannot handle: recursively defined structures. Extended RE's might be able to sort-of do it, but (to mix metaphors) it's better to punt and pick up a different tool.

Having said that, PCRE specifically has a recursive pattern feature, the typical demonstration is \((a*|(?R))*\) which can handle any combination of balanced parens and as. So you can probably adapt that, but you are trying to do something that I wouldn't try to do with REs.

Update: I'm not sure how useful this will be, but:

php > $t = "<div> how <div> now is the time </div>  now </div>";
php > preg_match('/<div>(.*|(?R))*<\/div>/',$t,$m); print_r($m);
Array
(
    [0] => <div> how <div> now is the time </div>  now </div>
    [1] => 
)
php > 

OTHER TIPS

You would be much better served by something like an XML/HTML parser. See here.

You can use XPath's "Axis specifiers" and "node set functions"

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top