Question

I'm trying to find & replace text passages in docx files with POI 3.8 like described here.

That works just fine if I insert my tags on the first run. But as soon as I re-open the docx file and do some modifications, Word actually does fragment across runs of text. So for example, "hello world" might be:

<w:r><w:t>Hello wo</w:t></w:r><w:r w:rsidR="00FB0672"><w:t>rld</w:t></w:r>

Such fragmentation will be due to things like change tracking, formatting, and spell checking I think.

Does anybody have an idea how to ...

a) ... disable this feature in MS Word?

b) ... somehow de-fragment the docx file afterwards?

c) ... any other solution to get rid of this fragmentation?

I already tried to save the file as .doc/.odt and re-save the file to .docx. But these fragmentations still persist...

Any help highly appreciated — thanks in advance for your help!

Était-ce utile?

La solution

In Word, the features you want to and can turn off are spelling and grammar checking, and rsid insertion.

This is for docx4j (a project I manage), not POI, but VariablePrepare shows you what needs to be done to de-fragment if you can't prevent it in Word. Since POI uses a similar XML marshalling/unmarshalling approach (albeit XML Beans, not JAXB), you should be able to convert that code to use the POI API.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top