PHP,preg_match,Regular Expression. What am I doing wrong?
-
19-09-2019 - |
Question
Here is the pattern that I want to match:
<div class="class">
<a href="http://www.example.com/something"> I want to be able to capture this text</a>
<span class="ptBrand">
This is what I am doing:
$pattern='{<div class="productTitle">[\n]<((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)>([^\n]*)</a>[\n]<span class="ptBrand">}';
preg_match($pattern, $data, $matches,PREG_OFFSET_CAPTURE);
print_r($matches);
It prints:
Array ( )
Solution
As a general rule, regular expressions are a really poor means of parsing HTML. They're unreliable and tend to end up being really complicated. A far more robust solution is to use an HTML parser. See Parse HTML With PHP And DOM.
As for your expression, I don't see <div class="productTitle"
anywhere in the source so I'd start there. Likewise you're trying to parse a URL but there's no mention of the anchor tag (either directly or through a sufficient wildcard) so it'll fail there too. Basically that expression doesn't look anything like the HTML you're trying to parse.
OTHER TIPS
... Or this:
preg_match('/\s*([^>]+)\s*<\/a/',$string,$match);
Trims it too.
The pattern:
/<div class="class">\s*<a href=\"([^"]+)\">([^<]+)</a>/m
Would get the link and text roughly, but using the DOM library would be a much better method.
You can try this:
<a href=".*?">([\s\S]*?)</a>