Question

I want to analyse some data of a webpage, but here's the problem: the site has more pages which gets called with a __doPostBack function.

How can I "simulate" to go a page further and analyse this site, and so on..

At this time I analyse the data with JSoup in java - but I'm open to use some other language if it's necessary.

Was it helpful?

Solution 2

I used an automation library which is Selenium, which you can use in a lot of languages (C#, Java, Perl,...)

For more information how to start this link is very helpful: this.

OTHER TIPS

A postback-based system (.NET, Prado/PHP, etc) works in a manner that it keeps a complete snapshot of the browser contents on the server side. This is called a pagestate. Any attempt to manipulate with a client that is not JavaScript-capable is almost sure to fail.

What you need is a JavaScript-capable browser. The easiest solution I found is to use the framework Firefox is written in - XUL - to create such a desktop application. What you do is basically create a desktop application with a single browser element in it, which you can then script from the application itself without restrictions of the security container. Alternatively, you could also use the Greasemonkey plugin to do your bidding. The latter is a bit easier to get started with, but it's fairly limited since it's running on a per-page basis.

With both solutions you then have access to the page's DOM to gather data and you can also fire events (like clicking on a button). Unfortunately you have to learn JavaScript for this to work.

As well as Selenium, you can use http://watin.org/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top