Domanda

class Crawl(webapp2.RequestHandler):    
    def get(self):      
            from google.appengine.api import urlfetch
            url = "http://www.example.com/path/to a/page" #URL with a space
            result = urlfetch.fetch(url)
            self.response.write('url: %s' % (result.status_code)) ## Outputs 400
            self.response.write(content) # Gives me 400 error page

We can't deny the fact that there are thousands of URLs that contain spaces. There is no way we can correct them one by one.

Why does urlfetch get 400 bad request error for this kind of URL which is perfectly accessible through the browser? How to overcome this?

È stato utile?

Soluzione

This is caused because the URL needs to be properly encode (as discussed below). Make sure any url's with spaces are properly encoded with a %20 in place of any space.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top