Use a DOM
and never use regular expressions for parsing HTML.
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('strong') as $tag) {
echo $tag->nodeValue."<br>";
}
foreach ($dom->getElementsByTagName('span') as $tag) {
echo $tag->nodeValue."<br>";
}
OUTPUT :
this one
this two
this three
test one
test two
test three
Why I shoudn't use Regular Expressions to parse HTML Content ?
HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML.
That article was from our Jeff Atwood. Read more here.