Question

I am learning web programming with Python, and one of the exercises I am working on is the following: I am writing a Python program to query the website "orbitz.com" and return the lowest airfare. The departure and arrival cities and dates are used to construct the URL.

I am doing this using the urlopen command, as follows:

(search_str contains the URL)

from lxml.html import parse

from urllib2 import urlopen

parsed = parse(urlopen(search_str))

doc = parsed.getroot()

links = doc.findall('.//a')

the_link = (links[j].text_content()).strip()

The idea is to retrieve all the links from the query results and search for strings such as "Delta", "United" etc, and read off the dollar amount next to the links.

It worked successfully until today - It looks like orbitz.com has changed their output page. Now, when you enter the travel details on the orbitz.com website, there appears a page showing a wheel saying "looking up itineraries" or something to that effect. This is just a filler page and contains no real information. After a few seconds, the real results page is displayed. Unfortunately, the Python code return the links for the filler page each time, and I never obtain the real results.

How can I get around this? I am a relative beginner to web programming, so any help is greatly appreciated.

No correct solution

OTHER TIPS

This kind of things is normal in the world of crawlers.

What you need to do is figure out what url it is redirecting to after the "itinerary page" and you hit that url directly from your script.

Then figure out if they have changed the final search results page too, if so modify your script to accommodate those changes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top