Question

I am using jsoup for reading a web page by the following function.

public Document getDocuement(String url){
        Document doc = null;
        try {
            doc = Jsoup.connect(url).timeout(20*1000).userAgent("Mozilla").get();
        } catch (Exception e) {
            return null;
        }
        return doc;
    }

But whenever i am trying to read a web page that contain javascript generated contents, jsoup does not read those contents. ie, the actual content of the page is loading by some javascript calls.So it is not present in the page source of that link. For example, this blog: http://blog.rapporter.net/search/label/r. Is there a way to get also javascript generated content when parsing page with Jsoup? If no please suggest any java html parser that can solve this problem..

Was it helpful?

Solution

You cannot do this with Jsoup. Jsoup parses HTML, to wait for AJAX requests or JavaScript content in general you would need a browser which could execute this JavaScript in order to get some output from it. JavaScript logic can be complex, so executing JavaScript and loading content is not a trivial thing (just take a look at how complicated browsers, JS and the DOM are).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top