Pregunta

I want to convert a HTML page into MS word. I want to know what API's will be helpful and also if there is any other option to do the same. The entire page is to be converted into .doc (eg. If there is a table in the html page, a similar table must be created in the word doc) . Apache POI does not provide an option to format the word document as in the HTML page. I need something that can give me a completely formatted word document.

Some of the things that i seek are JSOUP, docx4j, jasper reports, and JOD Convertor.

I tried parsing the HTML page using JSOUP and I get the contents of the page in my java program. Now I need to pass these contents to a doc/docx file. Can docx4j be helpful to get a formatted docx file?

Please help. Thank you.

¿Fue útil?

Solución 2

I found a way round to do the same. First I need to get the parsed objects using JSOUP and pass these to a document template. I am now looking for the options that can provide me creating easy templates and creating the document dynamically. I have asked another question regarding the same.

Otros consejos

I would go with Ashwini Raman's suggestion. It wont work with every scenario. In the case of a complex HTML document with many images and stuff word will not do a good job. But for most cases it should be fine. Otherwise, there is a complex task ahead of you. You will have to parse your HTML document using the jsoup library for example and then use the docx4j library to create your workd document. Links to both are here:

http://www.docx4java.org/trac/docx4j

http://jsoup.org/

When you are doing it also, the formatting might be iffy.

To answer your original question, no there is no ready made library that does what you are expecting. At least I havent come across any.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top