docx4j html conversion breaks one line into multiple span elements

https://stackoverflow.com/questions/18785808

28-06-2022
|

Pregunta

I have a small problem with the docx4j generated html. In 95% of the cases everything works fine. But there are some lines in the document, which are converted into multiple spans with the same styling. For example:

I have a line like: S44_XYZ_ABC_AABBCCDD. There are multiple lines like this in the document, and during conversion only 1-2% of them are converted in a bad way. This line becomes:

<span style=...>S44</span>
<span style=...>_</span>
<span style=...>XYZ</span>
           ...etc.

Since, I'm replacing these tokens with other text in the generated html string, this is a problem for me. I tried reformatting the line, deleting it and modifying another one copied from another document, where the conversion was ok, but with no luck.

I'm thinking that it is some kind of docx formatting related problem, but I cannot figure out the cause.

Thank you very much in advance!

Solución

Word has probably split your text S44_XYZ_ABC_AABBCCDD into multiple runs.

It can do this for various reasons including grammar/spell checking, different formatting etc etc. To see why it is happening in your case, unzip your docx, and look at word/document.xml

You can run VariablePrepare.java to address some causes of this.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow