Question

I am writing a code in PHP which fetches the content in a particular format from around 20 websites.

It is working normally for all the websites except one. Now, here is the issue.
I am using file_get_contents() to fetch images from the website and save it on my server. The image is present on the remote server and is accessible via browser but I am getting 404 response while doing it via code.

I am unable to understand the issue behind this as this method works perfectly for other websites.

Has it something to do with the headers being sent? Any help will be greatly appreciated.

Was it helpful?

Solution

The answer is probably: yes...

They're checking user-agents, I suppose.

And those are sent in your headers. You can fake your user-agent. Don't use file_get_contents() though, as that one doens't allow faking your user-agent. Look into curl.

Edit 1

Barmar's link shows a possibility to use file_get_contents() with a different user-agent at the same time. It's worth while looking into...

Edit 2

But it could also be about then checking the referrer... If that is the case you really need to use curl to be able to set the referrer.

Edit 3

Having seen the URL now, and looking at error 404 that you get (not a 50x) , I advise you to check if the URL is being escaped and parsed ok. I see that the URL contains spaces, and two slashes after the domain name. Check if spaces are escaped into %20 and if the double slashed shouldn't be stripped to just one slash.

So

http://celebslam.celebuzz.com//bfm_gallery/2014/03/Lindsay Lohan 2 Broke Girls/gallery_enlarged/gallery_enlarged-lindsay-lohan-2-broke-girls-01.jpg

Should become

http://celebslam.celebuzz.com/bfm_gallery/2014/03/Lindsay%20Lohan%202%20Broke%20Girls/gallery_enlarged/gallery_enlarged-lindsay-lohan-2-broke-girls-01.jpg

And notice, the server is CaSe-SeNsItIvE !

OTHER TIPS

Yep, first of all - check, if that site check referrer on images access. For example try to get image directly in browser

It also can check user-agent field and something else

Probably it will help to get file by curl ( code examples easy to find or i'll give you simple class )

P.S> just interesting. Can you give some images url examples to try?

Probably the referral or user agent. This includes both:

function file_get_contents_custom($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_REFERER, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux; i686; en-US; rv:1.6) Gecko Debian/1.6-7'); 
    $data = curl_exec($ch); 
    curl_close($ch);
    return $data;
}

Update:

The image you linked works fine for me using file_get_contents. It might be that the server has some sort of DDOS protection. How many requests are you making a second on average?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top