Вопрос

class Crawl(webapp2.RequestHandler):    
    def get(self):      
            from google.appengine.api import urlfetch
            url = "http://www.example.com/path/to a/page" #URL with a space
            result = urlfetch.fetch(url)
            self.response.write('url: %s' % (result.status_code)) ## Outputs 400
            self.response.write(content) # Gives me 400 error page

We can't deny the fact that there are thousands of URLs that contain spaces. There is no way we can correct them one by one.

Why does urlfetch get 400 bad request error for this kind of URL which is perfectly accessible through the browser? How to overcome this?

Это было полезно?

Решение

This is caused because the URL needs to be properly encode (as discussed below). Make sure any url's with spaces are properly encoded with a %20 in place of any space.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top