Docx4j v3 Docx to HTML with Images

https://stackoverflow.com/questions/22517890

docx4j

17-06-2023
|

Pregunta

I'm working to convert a docx to html using Docx4j version 3.

The document contains white space consisting of tabs, spaces and newlines. The resulting HTML either has unrecognized characters or does not preserve whitespace at all.

The java code I'm using is:

WordprocessingMLPackage wordMLPackage = Docx4J.load(is);

HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath( System.getProperty("user.dir") + uploadedImagesDirectory );
htmlSettings.setWmlPackage(wordMLPackage);

Docx4J.toHTML(htmlSettings, out, Docx4J.FLAG_EXPORT_PREFER_XSL);

String result = ((ByteArrayOutputStream)out).toString();

How can I preserve the whitespace in the document. Also, is there a method to apply css to a particular node? Specifically, I have 3 images which should be evenly spaced horizontally on the page.

I've looked over the documentation and searched online with no success.

Thank you.

Solución

I resolved the issue and it was not related to Docx4j. Docx4j parsed the document perfectly! The problem was related to sending the output in an email.

I set the Spring helper javamail mime encoding to resolve this issue:

MimeMessageHelper message = new MimeMessageHelper(mimeMessage, true, "utf-8");

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow