I'm trying to scrape some product details from a website using the following code:

$list_url = "http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799";
$html = file_get_contents($list_url);
echo $html;

However, I'm getting this error:

Warning: file_get_contents(http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /homepages/19/d361310357/htdocs/shopaholic/rss/topshop_f_uk.php on line 123

I gather that this is some sort of block by the website to prevent scraping. Is there a way around this - perhaps using cURL and setting a user agent?

If not, is there another way of getting basic product data like item name and price?

EDIT

The context of my code is that I'd eventually still want to be able to achieve the following:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
有帮助吗?

解决方案

I've managed to fix it by adding the following code...

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');

...as per this answer.

其他提示

You should use cURL , not the simple way with file_get_contents().
Use cURL and set up the proper http headers to mimic a proper http request (a real request).

P.S. : set up cURL to follow redirects . Here is the link to cURL

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top