Question

I have been researching about the headless browsers available till to date and found HtmlUnit being used pretty extensively. Do we have any alternative to HtmlUnit with possible advantage compared to HtmlUnit?

Thanks Nayn

Was it helpful?

Solution 4

I am going to use Selenium for my use case, since it offers me to use the real browser and no deviation from what it would render in real world as compared to HtmlUnit. I am planning to use Selenium2 which has WebDriver integration and offers great API and cool fixes. Thanks Nayn

OTHER TIPS

As far as I know, HtmlUnit` is the most powerful headless browser.

What are you issues with it?

There are many other libraries that you can use for this.

  • If you need to scrape xml base data use JTidy.
  • If you need to scrape specific data from HTML you can use Jsoup.

Well I use jsoup - it's pretty much faster than any other API.

WebDriver with a virtual framebuffer is the only real alternative. The advantage is that it uses a real browser; the disadvantage is that it's more of a pain to set up, and the API is much poorer.

I use webkit as a headless browser, through Qt's Python bindings: http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/qtwebkit.html

Webkit is the render engine used by Chrome and Safari, and is very flexible.

One of my reasons for choosing it over HtmlUnit was ease of setting up:

sudo apt-get install python-qt4

I would also recommend Selenium. The great feature is you can create a client that opens a browser page that you can see what's happening at each step. Moreover, creating macros for automated tests is another good feature. However, if you need to scrap some information from web page HtmlUnit is better than selenium.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top