سؤال

I'm trying to scrape some product details from a website using the following code:

$list_url = "http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799";
$html = file_get_contents($list_url);
echo $html;

However, I'm getting this error:

Warning: file_get_contents(http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /homepages/19/d361310357/htdocs/shopaholic/rss/topshop_f_uk.php on line 123

I gather that this is some sort of block by the website to prevent scraping. Is there a way around this - perhaps using cURL and setting a user agent?

If not, is there another way of getting basic product data like item name and price?

EDIT

The context of my code is that I'd eventually still want to be able to achieve the following:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
هل كانت مفيدة؟

المحلول

I've managed to fix it by adding the following code...

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');

...as per this answer.

نصائح أخرى

You should use cURL , not the simple way with file_get_contents().
Use cURL and set up the proper http headers to mimic a proper http request (a real request).

P.S. : set up cURL to follow redirects . Here is the link to cURL

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top