Question

I have recently started using Web-Harvest as a web scraping tool. Currently, I am working in the beginning of a project where I want to authenticate / log in to a web site. Before I begin I want to make clear that [URL] in the code replaces the actual url of the web page.

So, I am trying to post login information by executing the following config:

<config>
    <var-def name="result"> 
        <http method="post" url="[URL]/webreservations/WebObjects/WebReservations.woa/wa/Login?language=1&amp;server=1" multipart="true"> 
        <http-param name="login">[myusername]</http-param>
        <http-param name="password">[mypassword]</http-param>
    </http>
</var-def>
</config>

How do I retrieve the resulting information and follow the re-direction? When logging in manually the extension below is added to the URL. There seems to be some kind of randomisation and also a session id that is added. I suppose that is something I need to incorporate in my solution?

[URL]/nP8oIdbhk8MTXkrQ7Y2Z1g/0.3.0;jsessionid=2EF81CDA9A2EFF0B14E45889BC279BFA

Below is a part of the source of the page, that might be key to the problem. Is it a WebObjects problem? Is it a javascript problem? Am I the problem? :)

<body onload="document.form.login.focus();">
   <form name="form" onsubmit="showDiv();return true;" method="post" action="/webreservations/WebObjects/WebReservations.woa/wa/Login">
...
</form>
</body>

Any help is greatly appreciated.

Was it helpful?

Solution

make sure you have got all the necessary params for login. It may require more than just password and username.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top