Question

In accordance with GAE fetch documentation, cookies are not handled with redirects:

Cookies are not handled upon redirection. If cookie handling is needed, set follow_redirects to False and handle both cookies and redirects manually.

So, I am trying to implement manual solution:

page = urlfetch.Fetch(
    url = url,
    payload = form_data,
    method = urlfetch.POST,
    headers = headers,
    follow_redirects = False,
    deadline = 60)
cookies = ''
while page.status_code == 302:
    url = page.headers.get('location')
    if page.headers.get('set-cookie'):
        cookies = page.headers.get('set-cookie')
        headers['cookie'] = cookies
    page = urlfetch.Fetch(
        url = url,
        method = urlfetch.GET,
        headers = headers,
        follow_redirects = False,
        deadline = 60)
if page.status_code == 200 and page.content:
    self.response.out.write(page.content)

But it doesn't work as expected. Looks like I am missing some cookies:

header_msg An instance of httplib.HTTPMessage containing the response headers. If there may be multiple headers with the same name (for example, Set-Cookie headers), call header_msg.get_headers(header_name) to retrieve the values as a list.

But how should I use that header_msg?

Was it helpful?

Solution

If I'm understanding the problem, you want to collect (and cumulatively pass on) the cookies from each response, but URLFetch with follow_redirects=True only returns the cookies from the last response. Furthermore, the default behavior doesn't implement a cookie jar that will result in latter requests being sent with the right Cookie headers corresponding to the Set-Cookies in prior responses. Presumably the initial POST is a login form that redirects to a page expecting the cookie, a scheme which can't work with these limitations.

To that end, your code is close, but cookies = page.headers.get('set-cookie') is wiping out previously collected cookies after each request. This should work better:

page = urlfetch.Fetch(
  url = url,
  headers = headers,
  follow_redirects = False)
cookies = []
while page.status_code == 302:
  url = page.headers.get('location')
  if page.headers.get('set-cookie'):
    cookies.extend(page.header_msg.getheaders('set-cookie'))
  headers['cookie'] = '; '.join(cookies)
  page = urlfetch.Fetch(
    url = url,
    method = urlfetch.GET,
    headers = headers,
    follow_redirects = False)
if page.status_code == 200 and page.content:
  self.response.out.write(page.content)

Some caveats:

  • If Location is a relative path, you'll need to fix up url.
  • If any Set-Cookie header is not just key=value (e.g. it has an expiration), you'll need to parse the header value so you can send just the key/value pair. See the Cookie libraries for assistance with parsing.
  • This code will happily send duplicate cookies if more than one Set-Cookie was seen for a particular key.
  • If the redirect winds up on a separate domain, this will incorrectly send it cookies from the original domain. This may be a security problem. A proper cookie jar implementation can reason about domain and path restrictions to determine when to accept and emit cookies. You may want to incorporate the cookielib.CookieJar library. If you expect the request sequence to be on the same domain, it may be enough to just abort if you detect a switch.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top