Question

I have a Java web application that generates an MS Word document in the WordML format (a single XML file in Word 2003 XML format with a .xml file extension). I would like to automatically convert this into the newer Office Open XML format so that the document could be saved as a .docx file (which in essence is a zip file containing multiple XML files).

This has to be fully automated, and cannot require the user to download the file and convert it manually. Furthermore, the user cannot be assumed to have MS Word installed (they could be using LibreOffice instead).

I have been looking for a Java library I could use to do this, but couldn't find any that converts .xml to .docx. The only converter I could find was JODconverter but it doesn't support conversion from .xml to .docx.

Is there a Java library that could do this sort of conversion? Or maybe should I be looking for a non-Java solution? Maybe a Python module could do this? (For example a Python script could take the files generated by the Java app and convert them do .docx.)

Was it helpful?

Solution

If you can't modify your app to emit Flat OPC XML, you could write an XSLT to convert from Word 2003 XML format to Flat OPC XML. They are quite similar.

Then, docx4j (disclosure: I maintain this) supports Flat OPC XML to docx.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top