سؤال

I am trying to automate login to this page http://portal.globaltransit.net/ the thing is the page return a 401 header when you first reach the page but does not show standerd bassic http auth page rather a http form. Here is the output of curl -vvv http://portal.globaltransit.net/

* About to connect() to portal.globaltransit.net port 80 (#0)
* Trying 124.158.236.65... connected
* Connected to portal.globaltransit.net (124.158.236.65) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: portal.globaltransit.net
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< Date: Thu, 14 Nov 2013 07:18:06 GMT
< Server: Apache
< X-Powered-By: PHP/5.2.11
< Set-Cookie: symfony=1960d9b76a5f9fc3b00786e126cc69af; path=/
< Content-Length: 1211
< Content-Type: text/html; charset=utf-8
<
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <title></title>
    <link rel="shortcut icon" href="/favicon.ico" />
    <link rel="stylesheet" type="text/css" media="screen" href="/css/main.css" />
      </head>
  <body>


<form action="/login" method="post">
  <table>
    <tr> 
  <th><label for="signin_username">Username</label></th>
  <td><input type="text" name="signin[username]" id="signin_username" /></td>
</tr>
<tr>
  <th><label for="signin_password">Password</label></th>
  <td><input type="password" name="signin[password]" id="signin_password" /></td>
</tr>
<tr>
  <th><label for="signin_remember">Remember</label></th>
  <td><input type="checkbox" name="signin[remember]" id="signin_remember" /><input type="hidden" name="signin[_csrf_token]" value="6bdf80ca900038ada394467752593135" id="signin__csrf_token" /></td>
</tr>
  </table>

  <input type="submit" value="sign in" />
  <a href="/request_password">Forgot your password?</a>
</form>
  </body>
</html>

when i try to use machanize to load the page with the following script

import mechanize
import mimetypes
import logging
import urllib2
from urlparse import urlparse
import cookielib
from base64 import b64encode
class Browser:
    def __init__(self, url):
        br = mechanize.Browser()
        br.set_handle_robots(False)   # no robots
        br.set_handle_refresh(False)
        br.set_handle_redirect(True)
        br.set_debug_http(True)
        cj = cookielib.LWPCookieJar()
        br.set_cookiejar(cj)  # can sometimes hang without this
        br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
        self.page = br.open(url).read()
        print self.page
if __name__ == '__main__':
    browser = Browser("http://portal.globaltransit.net/")

I get the following error mechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized. I was wondering if there any way to get mechanize to ignore the 401 returned by the server so I can process the form.

هل كانت مفيدة؟

المحلول

I think you might be able to do something like this:

try:
    response = mechanize.urlopen("http://portal.globaltransit.net/")
except mechanize.HTTPError, response:
    pass

body = response.read()
#Do stuff with the form in the response body

نصائح أخرى

The error is raised in mechanize.Browser._mech_open, looking at it there seems to be no way to disable the error. However, one can monkey patch the function; copy the function in the mechanize source into your source code and then replace if not success: with if not sucess and response.getcode() != 401: so that if the error is 401, it will still return the response. Then monkey patch the function using mechanize.Browser._mech_open = _mech_open. Do this right after the modules are imported.

There is a way around this be actually getting a non 401 response from the server.

Firstly: Try this

for form in br.forms():
   print "Form name:", form.name
   print form

the response looks like:

Form name: None
<POST http://portal.globaltransit.net/login application/x-www-form-urlencoded
<TextControl(signin[username]=)>
<PasswordControl(signin[password]=)>
<CheckboxControl(signin[remember]=[on])>
<HiddenControl(signin[_csrf_token]=ec9a290dcc8d71e458d31a0fd509376b) (readonly)>
<SubmitControl(<None>=sign in) (readonly)>>

Here you can see the actual url returned is a bit different than the url used in you code.

Now do :

response = br.open("http://portal.globaltransit.net/login application/x-www-form-urlencoded")

the response is :

send: 'GET /login application/x-www-form-urlencoded HTTP/1.1\r\nAccept-Encoding:  
identity\r\nHost: portal.globaltransit.net\r\nCookie:  
symfony=f6fa25cf26e310e7e8bb3170637fdd73\r\nConnection: close\r\nUser-Agent:  
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 
Firefox/3.0.1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 26 Nov 2013 17:45:01 GMT
header: Server: Apache
header: X-Powered-By: PHP/5.2.11
header: Content-Length: 1211
header: Connection: close
header: Content-Type: text/html; charset=utf-8

The key here is to use the Url returned from the br.form name. Then you can go ahead and use the forms as usual.

A good guide to do this can be found here http://www.pythonforbeginners.com/cheatsheet/python-mechanize-cheat-sheet/

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top