Is this webpage-logging-in Python script correct?

https://stackoverflow.com/questions/3642569

30-09-2019
|

Pergunta

Is this Python script correct?

import urllib, urllib2, cookielib 

username = 'myuser' 
password = 'mypassword' 

cj = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
login_data = urllib.urlencode({'username' : username, 'j_password' : password}) 
opener.open('http://www.example.com/login.php', login_data) 
resp = opener.open('http://www.example.com/hiddenpage.php') 
resp.read()

I found this script HERE.It is meant to login to a webpage first, retrieve the cookies, store them and use them in order to open some other page in the same website. I want to login in this way to my eBay account (the URL is https://signin.ebay.com/ws/eBayISAPI.dll?SignIn ) and then go to my inbox on my eBay account (the URL is http://my.ebay.com/ws/eBayISAPI.dll?MyEbay&gbh=1) .

So, here are the values that I need to use in this script:

First (Sing-in) URL: https://signin.ebay.com/ws/eBayISAPI.dll?SignIn

Second URL: http://my.ebay.com/ws/eBayISAPI.dll?MyEbay&gbh=1

My login name on eBay: tryinghard

My password on eBay: gettingsomewhere

With all these new values the above script must look this way:

import urllib, urllib2, cookielib 

username = 'tryinghard' 
password = 'gettingsomewhere' 

cj = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
login_data = urllib.urlencode({'username' : username, 'j_password' : password}) 
opener.open(https://signin.ebay.com/ws/eBayISAPI.dll?SignIn', login_data) 
resp = opener.open(http://my.ebay.com/ws/eBayISAPI.dll?MyEbay&gbh=1') 
resp.read()

Is it correct? I am especially suspicious about the login_data = line (the fourth one from bottom), why is it a j_password there instead of just password?

I tried this script with all these values and it didn't work. Does anybody know why it doesn't work in my case?

I've already learned how to log in to my eBay account and then check some other pages there by means of running a python script that is using twill as an external module, but that was only successful when I ran that script from the command prompt or from the Python shell. It wasn't successful when I tried running that script by means of "Google App Engine Software Development Kit" that I had downloaded from "Google App Engine".

Later I was told here that it wasn't successful because "Google App Engine" doesn't like external modules. That's why I found this script - those modules that it is importing in the very beginning (urllib, urllib2, cookielib) are all built-in modules.

Solução

A simple "view source" on the login page whose URL you give reveals very easily the following detail about it... (just formatting the HTML minimally for readability):

<span style="display:-moz-inline-stack" class="unl">
  <label for="userid">User ID  </label></span>
<span><input size="27" maxlength="64" class="txtBxF"
       value="" name="userid" id="userid"></span></div>
<div><span style="display:-moz-inline-stack" class="unl">
  <label for="pass">Password  </label></span>
<span><input size="27" maxlength="64" class="txtBxF"
       value="" name="pass" id="pass" type="password"></span>

As you can see at a glance, the names of the crucial input fields are not username and j_password as you're using, but rather userid and pass. It's therefore obviously impossible for your code to work as it currently stands.

Read a bit more of the page and you'll also see soon after:

<input type="checkbox" name="keepMeSignInOption" value="1" id="signed_in"></b>
<span class="pcsm"><label for="signed_in"><b>Keep me signed in for today.</b>

Most likely you'll have to simulate that checkbox being selected to get cookies that are usable (at least for anything but a fleeting time;-).

And so on, and so forth, really -- the attempt to automate interaction with a page without bothering to read that page's source to get the actual IDs and names to use strikes me as definitely displaying a very optimistic attitude towards life, the universe, and everything...;-). Incidentally, to simplify such interaction (after perusing the source;-), I've found mechanize quite handy (and more robust than trying to hack it just with the standard library, as you are doing).

Also, before automatic interaction with a site, always check out its robots.txt to make sure you're not breaking its terms of use -- sites can easily identify "robots" (automated interaction) as opposed to "humans", and retaliate against robots.txt violation by banning, blacklisting, and worse; you don't really want to run into that;-).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow