Domanda

I'm new to http and in need of help. I'm trying to fill out a search form in craigslist so that I can get the link to the page I would have normally gotten if I had filled out the form manually. By viewing the source, I've found this form:

<form id="search" action="/search/" method="GET">
            <div>search craigslist</div>
            <input type="hidden" name="areaID" value="372">
            <input type="hidden" name="subAreaID" value="">
            <input id="query" name="query" autocorrect="off" autocapitalize="off"><br>
            <select id="catAbb" name="catAbb">
                <option value="ccc">community</option>
                <option value="eee">events</option>
                <option value="ggg">gigs</option>
                <option value="hhh">housing</option>
                <option value="jjj">jobs</option>
                <option value="ppp">personals</option>
                <option value="res">resumes</option>
                <option value="sss" selected="selected">for sale</option>
                <option value="bbb">services</option>
            </select>


<input id="go" type="submit" value="&gt;">
    </form>

So I wrote this code to fill out the form:

import urllib,httplib
conn = httplib.HTTPConnection("auburn.craigslist.org")
params = urllib.urlencode({'query': 'english tutor', 'catAbb': 'bbb'})
conn.request("GET","/search",params)
response = conn.getresponse()
print response.read()

I'm not sure about everything, e.g. how do I specify which form do I want to fill? I assumed it is by specifying "\search" as in the form's "action", but should it really be in the 'url' argument in httplib.request? Anyway, Instead of getting a url to my desired results page, I get this html page:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
    <title>auburn craigslist search</title>
    <blockquote>
        <b>You did not select a category to search.</b>
    </blockquote>

But I'm pretty sure I did select a category. What should I do? Thanks!

È stato utile?

Soluzione

You send HTTP GET params in the URL (and not as an encoded part of the request body like POST), change your Python to look like this and you should get what you are after:

import urllib,httplib

conn = httplib.HTTPConnection("auburn.craigslist.org")
params = urllib.urlencode({'query': 'english tutor', 'catAbb': 'bbb'})
conn.request("GET","/search?%s" % params)
response = conn.getresponse()

print response.read()

Also you it will make your life a lot easier if you pass this input to Beautiful Soup, for parsing and extracting information.

Altri suggerimenti

why don't you use requests (http://docs.python-requests.org/en/latest/):

import requests    
response = requests.get("http://auburn.craigslist.org/search/", params={"query": "english tutor", "catAbb": "sss"})
response.content

Generally, I recommend using a Browser plugin such as HttpFox to see what exactly happens when you use the normal browser and then to reproduce this programmatically. With HttpFox, you will exactly see the structure of the HTTP GET request as sent by your browser.

It looks like you need to provide all of these query parameters: areaID, subAreaID, query, catAbb (you missed two of those).

The error message of the web application might just as well be not precise/buggy.

please try with one of the following

    conn.request("GET", "http://auburn.craigslist.org/search/", params)
    conn.request("GET", "/search/", params)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top