Question

I found out how to retreive the html page of a topic from google search using a tutorial.This was given in the tutorial.

import mechanize
br = mechanize.Browser()
br.open('http://www.google.co.in')
br.select_form(nr = 0)

I understood till this that it retrieves the form.Then it was given that

br.form['q'] = 'search topic'
br.submit()
br.response.read()

This does output the html of the page related to the search topic. But my doubt is what should this parameter in br.form[parameter] be? Because I tried it for Google News and it gave a successful result.Can someone help me out?

Was it helpful?

Solution

It's the id of the form field, as given in the page source.

You can get the available id values like so:

import mechanize

br = mechanize.Browser()
br.open("http://www.google.com/")

for f in br.forms():
    print f

which gives me:

<f GET http://www.google.ca/search application/x-www-form-urlencoded
  <HiddenControl(ie=ISO-8859-1) (readonly)>
  <HiddenControl(hl=en) (readonly)>
  <HiddenControl(source=hp) (readonly)>
  <TextControl(q=)>
  <SubmitControl(btnG=Google Search) (readonly)>
  <SubmitControl(btnI=I'm Feeling Lucky) (readonly)>
  <HiddenControl(gbv=1) (readonly)>>

which says that:

  1. There is only one form on the page

  2. Hidden field id's are ie (page encoding), hl (language code), hp (? don't know), and gbv (also don't know).

  3. The only not-hidden field id is q, which is a text input, which is the search text.

OTHER TIPS

The parameter should be the name of the form element you are filling with the string. You can find the name the easiest way using something like firebug to inspect the web page (that is for firefox, use whatever you have available for your browser). You can also try to look at the source of the page, but that is tedious when the page is complex.

E.g. the name of the form - element of the box I am typing this in is "post-text"

Look at http://www.google.co.in resource , it have this code:

<input class="lst lst-tbb" value="" title="Google 搜索" size="41" type="text" 
                       autocomplete="off" id="lst-ib" name="q" maxlength="2048"/> 

name="q" indicate the parameter in br.form[parameter]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top