Pregunta

I am developing a Java project that uses the following website: http://feedenlarger.com/

The program essentially just inputs a link [ for example http://feeds.bbci.co.uk/news/rss.xml ] into the "Enter partial feed URL" box and submits the form. Once the form is submitted I want to download the page that a user would access if they manually filled out the form and clicked the button on the page.

How can I do this in Java?

I have managed to successfully download the page with the form using:

    private String readWebPage() throws IOException{
        URL u = null;
        URLConnection urlC = null;
        BufferedInputStream bis = null;
        try{
            u = new URL(this.url);
            urlC = u.openConnection();
            headers = urlC.getHeaderFields();

            bis = new BufferedInputStream(urlC.getInputStream());

            StringBuilder builder = new StringBuilder();
            int byteRead;
            while ((byteRead = bis.read()) != -1)
                builder.append((char) byteRead);

            bis.close();
            return builder.toString();
        } catch(IOException e){
            System.out.println("Webpage: IO Error");
            throw e;
        }
    }

Note: I'm interested in keeping the headers in memory so using URLConnection or similar is preferable.

How can I now fill in the form, submit using a POST/GET request and download the next webpage? I have tried downloading the page from the link generated by my browser once the form is submitted but this gives me a "forbidden" error.

The compilable project I have been using is available here

¿Fue útil?

Solución

(Posted this as answer since it is too big to fit in a comment. Also, posted as community wiki in order to avoid earning any rep [or losing it in case somebody doesn't read it this message] ).

This is the result of the network monitor when I sent my last comment on the page:

Remote Address:198.252.206.140:80
Request URL:http://stackoverflow.com/posts/23431154/comments
Request Method:POST
Status Code:200 OK
Request Headersview source
Accept:text/html, */*; q=0.01
Accept-Encoding:gzip,deflate,sdch
Accept-Language:es,en-US;q=0.8,en;q=0.6,pt;q=0.4,fr;q=0.2
Connection:keep-alive
Content-Length:322
Content-Type:application/x-www-form-urlencoded
Cookie:__qca=P0-1914216052-1380726140973; __utma=140029553.1039400677.1380726141.1389622782.1389628108.351; __utmz=140029553.1389047375.333.15.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); sgt=id=dfc633fa-3459-4f24-be6f-ca2ee08908cd; usr=t=f94uBq5WkGsH&s=BDL1eqRYkOQ5&p=[2|2][10|15]; _ga=GA1.2.1039400677.1380726141
Host:stackoverflow.com
Origin:http://stackoverflow.com
Referer:http://stackoverflow.com/questions/23431154/html-form-handling-in-java?noredirect=1
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36
X-Requested-With:XMLHttpRequest
Form Dataview sourceview URL encoded
comment:Check out all the fields you need to sent and which of them are auto generated by the server (usually hidden fields). Basically, you need to check all the elements about how the request is being sent. This can be easily done in Chrome or Firefox + Firebug by pressing F12.
fkey:bc6f108950fe59611b3f1ebf4caedb31
Response Headersview source
Cache-Control:private
Content-Encoding:gzip
Content-Length:2158
Content-Type:text/html; charset=utf-8
Date:Fri, 02 May 2014 15:21:55 GMT
Pragma:no-cache
Vary:Accept-Encoding
X-Frame-Options:SAMEORIGIN

Here's an example of how the Network Monitor looks

example of Network Monitor

Otros consejos

Based on your scenario, we can implement using this HMTLUnit and Selenium tools.

You can input the text using HtmlTextInput and submit the form to navigate to the second page using anchor tags.

We have full support for HTML Tags in HTMLUnit. Using the HTMLInput set the value in the first page using setAttributeValue("some text") method and proceed to the further page using the button or anchor tag you are looking for.

HtmlAnchor anchor = (HtmlAnchor)page.getHtmlElementById("second_page_link");
    page = (HtmlPage) anchor.click();
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top