Question

I want to get the whole(entire) web page's source code, however some contents of the website is not loaded at first.(Seems this have relation with Ajax) How can I get these contents which are not loaded at once with java?

I tried to use java's url.openStrem. But this didn't work. I only got the content "loading..." not the real content after loaded.

Thank you very much.

Était-ce utile?

La solution

You need to remote control an existing browser (which is not exactly easy with Java as most use other languages / component systems / interfaces) or use a headless browser that can execute Javascript. HTMLUnit would be of the latter category.

Autres conseils

Try using a html parser for such thing. Jericho Htmlparser would be helpful here.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top