Pregunta

I have two documents:

  1. Document 1 (input)
  2. Document 2 (output)

Document 2 is the result of passing Document 1 through a transformation process which leaves any content and formatting intact (verified by side-by-side compare in Word).

However, the process removes many id numbers from the .docx files.

For example,

      <w:p w:rsidP="00B600D6" w:rsidR="00F55D78" w:rsidRDefault="00B600D6">

becomes

      <w:p>

according to a dump of each document via the following code:

Body body = ((Document)newerPackage.getMainDocumentPart().getJaxbElement()).getBody();
Node node = org.docx4j.XmlUtils.marshaltoW3CDomDocument(body).getDocumentElement();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(node), 
             new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));

Using the docx4j Differencer comparison method recommended here, everything (except the first line which has no formatting applied) is shown as a modification.

Question is: Are the diffs a result of the missing id's, the formatting or something else?

In case it's important, we're using docx4j in this context to perform automated sanity/regression tests on our round-tripping proceess (i.e. apply the "loss-less" process and expect no differences)

¿Fue útil?

Solución

Disclosure: I work on docx4j

If the only difference between paragraphs is the rsid attributes, they will still be detected as different.

You could "clean" the documents before performing the comparison, so that neither docx has rsid attributes. See the Filter sample.

By the way, an easier way to see the XML for an object (eg a single paragraph, or the entire body) is to use XmlUtils.marshaltoString

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top