Question

I have an application that looks for certain links on a page (using CSS) and retrieves the pages referenced by the link by using agent#get on the href value. This has been working for months until today (presumably the web site has changed something since other websites still work) when instead the website is returning a different page (up the web page hierarchy, if that makes any difference; but it's probably the website deciding to return that page instead of the one requested). The page.uri reflects the actual URI returned which is different than the URI requested but the response code is 200, so presumably no redirection took place.

In trying to figure out what's going on, I tried locating the link and doing page.links[38].click. That returns the correct page. Finding the correct link programmatically is somewhat problematic (since you can't use CSS to find a link, only an element) so I'd like to continue using my current method. I'm trying to understand what is different about retrieving a page with agent#get vs. link#click. Before you ask, I've verified that the URI for the agent#get IS the same as that of the link that I #click. What does #click do differently than #get that could cause one to retrieve the correct page while the other retrieves a different page?

Was it helpful?

Solution 2

The problem turned out not to be a difference between Link#click and Agent#get, but the server had changed its response in certain situations. In other words, my assumptions were wrong.

OTHER TIPS

See for yourself what click does here. It calls get, but first it sets the referer and does some robots checking.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top