Question

I want to get the url redirection log using Mechanize written in Python. For example, www.google.com --> www.google.co.in. The exact question has been asked before in SO but it is for Ruby

How to get redirect log in Mechanize?

The answer explains that to do this one can do the following in Ruby -

for m.redirection_limit in 0..99
  begin
    m.get(url)
    break
    rescue WWW::Mechanize::RedirectLimitReachedError
      # code here could get control at
      # intermediate redirection levels
  end
end

I want to do the same using Python. Any help? What is the alternate of get(url) in Python for Mechanize?

Was it helpful?

Solution

You could override HTTPRedirectHandler.redirect_request() method to save a redirection history:

import urllib2

class HTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        newreq = urllib2.HTTPRedirectHandler.redirect_request(self,
            req, fp, code, msg, headers, newurl)
        if newreq is not None:
            self.redirections.append(newreq.get_full_url())
        return newreq

url = 'http://google.com'

h = HTTPRedirectHandler()
h.max_redirections = 100
h.redirections = [url]
opener = urllib2.build_opener(h)
response = opener.open(url)
print h.redirections
# -> ['http://google.com', 'http://www.google.com/', 'http://google.com.ua/']

It should be much faster than the provided WWW::Mechanize code snippet because urllib2 visits each url only once.

mechanize provides a superset of urllib2 functionality i.e., if you use mechanize then just replace every occurrence of urllib2 above with mechanize and it will work.

OTHER TIPS

j.f sebastian's answer works great if they are http redirections, but this would fail if they were javascript redirections. (urllib2 doesnt handle javascript redirections but Mechanize does!)

this should work for both types of redirections though!

import mechanize
import logging
import sys
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.INFO)

browser = mechanize.Browser()
browser.set_debug_redirects(True)

r=browser.open("http://google.com")

I was going to give you an 'IGIFY', but you are right, mechanize documentation sucks. Poking around a bit, it looks like you should look at urllib2, as mechanize exposes that entire interface.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top