I need to convert a bundle of static HTML documents into a single PDF file programmatically on the server side on a Java/J2EE platform using a batch process preferably. The pdf files would be distributed to site users for offline browsing of the web pages.

The major points of the requirements are:

  1. The banner at the top should not be present in the final pdf document.
  2. The navigation bar on the left should be transformed into pdf bookmarks from html hyperlinks.
  3. All hyperlinked contents (html/pdf/doc/docx etc.) present in the web pages should be part of the final pdf document with pdf bookmarks.

Is there any standard open source way of doing this?

No correct solution


Try Apache FOP. I just used it to convert XML to PDF and I think you can do the same with HTML/DOM. The website has a whole section on running FOP in a Java application and there's example code for DOM to PDF.

You can try iText - but I am not sure whether it handles all that you require.

Moreover, it is always better if you explore many options and then decide what you can and cannot do. In many cases there won't be any library/API that will out of the box support all that you ask for.

You can try Xml2PDF for this

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow