Question

I'm trying to scrape some product details from a website using the following code:

$list_url = "http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799";
$html = file_get_contents($list_url);
echo $html;

However, I'm getting this error:

Warning: file_get_contents(http://www.topshop.com/en/tsuk/category/sale-offers-436/sale-799) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /homepages/19/d361310357/htdocs/shopaholic/rss/topshop_f_uk.php on line 123

I gather that this is some sort of block by the website to prevent scraping. Is there a way around this - perhaps using cURL and setting a user agent?

If not, is there another way of getting basic product data like item name and price?

EDIT

The context of my code is that I'd eventually still want to be able to achieve the following:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
Was it helpful?

Solution

I've managed to fix it by adding the following code...

ini_set('user_agent','Mozilla/4.0 (compatible; MSIE 6.0)');

...as per this answer.

OTHER TIPS

You should use cURL , not the simple way with file_get_contents().
Use cURL and set up the proper http headers to mimic a proper http request (a real request).

P.S. : set up cURL to follow redirects . Here is the link to cURL

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top