Parsing content in html tags using regex

https://stackoverflow.com/questions/2001152

18-09-2019
|

Question

I want to parse content from

<td>content</td>
and
<td *?*>content</td>
and 
<td *specific td class*>content</td>

How can i make this with regex, php and preg match?

Solution

I think this sums it up pretty good.

In short, don't use regular expressions to parse HTML. Instead, look at the DOM classes and especially DOMDocument::loadHTML

OTHER TIPS

If you have an HTML document, you really shouldn't use regular expressions to parse it : HTML is just not "regular" enough for that.

A far better solution would be to load your HTML document using a DOM parser -- for instance, DOMDocument::loadHTML and Xpath queries often do a really great job !

<td>content</td>: <td>([^<]*)</td>

<td *specific td class*>content</td>: <td[^>]*class=\"specific_class\"[^>]*>([^<]*)<

@OP, here's one way

$str = <<<A
<td>content</td>
<td *?*>content</td>
<td *specific td class*>content</td>
<td *?*> multiline
content </td>
A;

$s = explode("</td>",$str);
foreach ($s as $a=>$b){
    $b=preg_replace("/.*<td.*>/","",$b);
    print $b."\n";
}

output

$ php test.php
content

content

content

 multiline
content

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow