1519696

19-09-2019
|

题

您好我想提取链接 <a href="/portal/clients/show/entityId/2121" > 我想这GIVS我正则表达式/门/客户/显示/ ENTITYID / 2121 最后2121的数量在不同的其他环节任何想法？

解决方案

正则表达式进行解析链接是这样的：

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

由于这是多么可怕的，我会建议使用简单的HTML DOM 至少得到链接。然后，您可以检查链接使用链接的href一些非常基本的正则表达式。

其他提示

简单的PHP HTML DOM解析器示例：

// Create DOM from string
$html = str_get_html($links);

//or
$html = file_get_html('www.example.com');

foreach($html->find('a') as $link) {
    echo $link->href . '<br />';
}

不要使用正则表达式等待处理XML / HTML 。这可以非常容易地使用href="http://php.net/manual/en/class.domdocument.php" rel="nofollow noreferrer">内置DOM解析器中

$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
    # Xpath query for attributes gives a NodeList containing DOMAttr objects.
    # http://php.net/manual/en/class.domattr.php
    echo $nodeList->item($i)->value . "<br/>\n";
}

当 “解析” HTML我主要依靠PHPQuery： http://code.google.com / p / phpquery / 而不是正则表达式。

这是我的解决方案：

<?php
// get links
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = "

// delete redundant parts
$matches = str_replace("a href=", "", $matches); // remove a href=
$matches = str_replace("\"", "", $matches); // remove "

// output all matches
print_r($matches[1]);
?>

我建议避免使用基于XML解析器，因为你不会总是知道，是否文档/网站已经很好地形成。

此致

从HTML

削皮链路可以使用点的HTML解析器来完成。

当你把所有的链接，简单得到最后的斜杠的指标，你有你的电话号码。没有正则表达式需要。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow

Preg_match_all https://stackoverflow.com/questions/1519696

Preg_match_all
https://stackoverflow.com/questions/1519696