Question

I have read the documentation about HOWTO Fetch Internet Resources Using urllib2. But I can't understand how to use the data parameter. The example:

import urllib
import urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()

is not working: socket.error: [Errno 104] Connection reset by peer

But what I understand from it is that I can name some options/parameters and give them a value. But my question is: How I know which parameters are in the website? Or how can I know them?

I have "played" with:

response =urllib2.urlopen(url)
html=response.read() 
print html

To read the website but I didn't succeed to fetch my data after trying some values that I thought they would work. In the website there is a button to choose file, and some radio buttons to select to get an output. How can I do it?

The webpage I am trying to fetch is this one.

Was it helpful?

Solution

Using urllib2 to drive forms and the like will lead to frustration.

https://pypi.python.org/pypi/mechanize is a good place to start.

http://www.sciprogblog.com/2012/01/scraping-data-with-python.html this guy has laid out lots of useful information. This won't answer your questions but it should lead you down the right paths.

Good luck.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top