PHP経由でWebサイトからデータを抽出します

https://stackoverflow.com/questions/2019892

19-09-2019
|

質問

私は何人かの友人のためにシンプルなアラートアプリを作成しようとしています。

基本的に、私は、二人のようなWebページからデータ「価格」と「在庫の可用性」を抽出できるようにしたい：

私は電子メールとSMSのパーツでアラートをしましたが、今ではウェブページ（それらの2つまたは他のもの）から数量と価格を取得して、利用可能な価格と数量を比較して警告できるようにしたいと思っています製品がいくつかのしきい値の間にある場合、注文する。

私はいくつかの正規表現を試しました（いくつかのチュートリアルで見つかりましたが、私はこれにはあまりにもn00bです）が、これを機能させることができませんでした、良いヒントや例はありますか？

解決

$content = file_get_contents('http://www.sparkfun.com/commerce/product_info.php?products_id=9279');

preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match);
$price = $match[1];

preg_match('#<input type="hidden" name="quantity_on_hand" value="(.*?)">#', $content, $match);
$in_stock = $match[1];

echo "Price: $price - Availability: $in_stock\n";

他のヒント

Googleが必要な場合に備えて、スクリーンスクレイピングと呼ばれます。

代わりに、DOMパーサーとXpath式を使用することをお勧めします。最初にHTMLTIDYを使用してHTMLをフィードして、有効なマークアップであることを確認します。

例えば：

$html = file_get_contents("http://www.example.com");
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//table[@class="pricing"]/th') as $node) {
  echo $node, "\n";
}

あなたがすることは何でも：正規表現を使用してHTMLを解析しないでください悪いことが起こります. 。使うパーサー代わりは。

1つ目は、この質問を詳細に説明します。 2番目に、ウェブサイトからデータを抽出することは合法ではないかもしれません。しかし、私はヒントがあります：

FirebugまたはChrome/Safari Inspectorを使用して、HTMLコンテンツと興味深い情報のパターンを調査します
正規表現をテストして、一致するかどうかを確認します。あなたはそれを何度も行う必要があるかもしれません（マルチパス解析/抽出）
Curlまたはさらにシンプルでクライアントを書くには、file_get_contentsを使用します（一部のホスティングは、file_get_contentsを使用してURLを無効にすることに注意してください）

私にとっては、有効なXHTMLに変換するためにTidyを使用し、XPathを使用して正規表現の代わりにデータを抽出した方が良いでしょう。なんで？ XHTMLは規則的ではなく、XPathは非常に柔軟であるためです。 XSLTを学習して変換できます。

幸運を！

You are probably best off loading the HTML code into a DOM parser like this one and searching for the "pricing" table. However, any kind of scraping you do can break whenever they change their page layout, and is probably illegal without their consent.

The best way, though, would be to talk to the people who run the site, and see whether they have alternative, more reliable forms of data delivery (Web services, RSS, or database exports come to mind).

The simplest method to extract data from Website. I've analysed that my all data is covered within tag only, so I've prepared this one.

<?php
    include(‘simple_html_dom.php’);
        // Create DOM from URL, paste your destined web url in $page 
        $page = ‘http://facebook4free.com/category/facebookstatus/amazing-facebook-status/’;
        $html = new simple_html_dom();

       //Within $html your webpage will be loaded for further operation
        $html->load_file($page);

        // Find all links
        $links = array();
        //Within find() function, I have written h3 so it will simply fetch the content from <h3> tag only. Change as per your requirement.
       foreach($html->find(‘h3′) as $element) 
        {
            $links[] = $element;
        }
        reset($links);
        //$out will be having each of HTML element content you searching for, within that web page
        foreach ($links as $out) 
        {
            echo $out;
        }                

?>

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow