Question

What is a general way to grab all href tags using regex and preg_match_all to get the href value given the tag is not always in order.

Example:

<link href="foo.css" rel="stylesheet" type="text/css"/>
<link type="text/css" href="bar.css" rel="stylesheet"/>
<link rel="stylesheet" type="text/css" href="bar1.css"/>
<link type="text/css" href="bar2.css" rel="stylesheet"></link>
<link href="path/foo.css" rel="stylesheet" type="text/css"/>

Should result in :

Array(
'foo.css',
'bar.css',
'bar1.css',
'bar2.css',
'path/foo.css',
)
Was it helpful?

Solution 2

The regex expression your looking for is something like this, but will require a bit further refinement:

<link\s+(?:[^>]*?\s+)?href="([^"]*)"

Testing against

<link href="foo.css" rel="stylesheet" type="text/css"/>

The returned value is

<link href="foo.css"

Here's a good place to test out your expressions: http://regexpal.com/

OTHER TIPS

Parsing is the way to go:

$x = file_get_contents("foo.txt");
$xml = simplexml_load_string("<links>$x</links>");
$results = array();

foreach ($xml->link as $link)
    $results[] = (string)$link['href'];

see it working: https://eval.in/132898

i would say:

preg_match_all('/href=\"([a-z1-9\/.]+)\"/img', $head, $matches)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top