Question

Hay guys i need help on a regex.

I'm using file_get_contents() to get the source of a page, i want to then loop through the source and find all the and extract all the HREF values into an array.

Thanks

Was it helpful?

Solution

You should better use a real parser like SimpleXML or DOMDocument than regular expressions. Here’s an example with DOMDocument that will give you an array of A elements:

$doc = new DOMDocument();
$doc->loadHTML($str);
$aElements = $doc->getElementsByTagName("a");
foreach ($aElements as $aElement) {
    if ($aElement->hasAttribute("href")) {
        // link; use $aElement->getAttribute("href") to retrieve the value
    } else {
        // not a link
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top