Domanda

Using PHP I'm trying to download/save the following image:

http://www.bobshop.nl/catalog/product_image.php?size=detail&id=42428

When you load this image in a browser, you can see it, but when I try to download it using several different methods, I get an 1 KB file that says that the product could not be found on the server.

I tried this with both the file_put_contents and the curl way. I even used the function get_web_page that I found somewhere on StackOverflow, to catch a possible redirect.

What else could be the reason that you can see the image in a browser, but no way to download it ?

UPDATE: Thanks to an error that was thrown trying out the different answers, I just found out the real cause of the problem. Somewhere in the process of scraping the html, the URL got & instead of & . I replace these now and every other method works now too... thanks all!

È stato utile?

Soluzione

I just implemented a simple way to download and store and it worked:

<?php

$fileContent = implode("",file("http://www.bobshop.nl/catalog/product_image.php?size=detail&id=42428"));
$fp = fopen("/tmp/image","w+");

fwrite($fp, $fileContent);

fclose($fp);

?>

Are you behind a proxy? This could be the problem (you are with proxy configured but php not) ;)

Altri suggerimenti

There is likely some kind of header checking that is being done with this PHP script to ensure that a browser is requesting the image and not someone trying to scrape their content. This can be forged (although after doing something like this I feel like I need to take a shower) with cURL. Specifically, curl_setopt():

curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    'User-agent: Some legitimate string'
));

To find out which headers need to be sent, you'll need to do some experimentation. If you have Google Chrome, you've probably used the Inspector (If you don't Firefox has similar addons, so you can use something like Firebug). If you request the image with Chrome, you can right click to inspect it. Go to the Network tab. Now refresh the page. The request to product_image.php should show up. If you click on it and click the Headers tab, you should see a list of headers sent. My browsers sends: User-Agent, Accept, Accept-Encoding, Accept-Language, and Accept-Charset.

enter image description here

Try combinations of these headers with valid values to see which ones need to be sent for the image to be returned. I'd bet that this site probably only checks User-agent so start with that one.


An important note: You should cache the result of this call, because it will be very suspicious if your server requests the image multiple times in rapid succession (say if many users on your site request the script that grabs this image). Also as an extra layer of anonymity, you might want to pick your User-agent from an array of valid ones so bobshop.nl thinks that all of the requests are coming from users behind a large network (like a college campus). You can find valid user agent strings on UserAgentString.com.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top