Pergunta

I have the following HTML on my webpage:

<p>This is a <a href="http://www.google.com/">hyperlink</a> and this is another <a href="http://www.bing.com/">hyperlink</a>. There are many like it, but <a href="http://en.wikipedia.org/wiki/Full_Metal_Jacket">this one is mine</a>.</p>

Now, I was wondering...

Is there any way, I can use a PHP function to split this block of text up into an array?

$html[0] = "<p>This is a & this is another . There are many like it, but .</p>";
$html[1] = "http://www.google.com/";
$html[2] = "http://www.bing.com/";
$html[3] = "http://en.wikipedia.org/wiki/Full_Metal_Jacket";

So, basically stripping the initial block of text of all hyperlinks and storing them all in their own array element.

Many thanks for any help with this.

Foi útil?

Solução

Use this RegEx to get URL's of html:

  $url = "http://www.example.net/somepage.html";
  $input = @file_get_contents($url) or die("Could not access file: $url");
  $regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
  if(preg_match_all("/$regexp/siU", $input, $matches)) {
    // $matches[2] = array of link addresses
    // $matches[3] = array of link text - including HTML code
  }
?>
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top