Question

I am trying to fetch URLs using Google App Engine's urlFetch service and implement a proxy site. Sites like Twitter and Facebook appear disfigured as if they are missing the stylesheet, even Google is missing the Google logo but Yahoo opens all fine, I can't understand why.

Was it helpful?

Solution

When you use urlfetch, it fetches the HTML of the page, and none of the images, CSS, JavaScript, or any other resources.

Yahoo looks fine presumably because they specify their images and CSS using absolute URLS (e.g., http://www.yahoo.com/image.png), so when your urlfetch'd page displays, it includes full image URLs from yahoo.com. Keep in mind, when someone doesn't have access to yahoo.com, those images won't appear on your proxied page either.

edit: It looks like Yahoo inlines their CSS into the HTML page itself, which would explain why it works in your fetched copy.

Google appears without CSS/images because their CSS/images are specified as relative URLs (e.g., /image.png), and your proxy doesn't have an image at /image.png

You'll have to parse the urlfetch'ed page content to find images and CSS that need to be fetched and proxied as well. Just be sure to handle relative URLs like /resource.png as well as absolute URLs like www.foo.com/resource.png.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top