I've got a very simple file generated by Apache POI, that contains an image and a sentence in the page header, nothing else. The image, while embedded, doesn't display under Word.
I have taken a lot of time to compare the same file generated by Word, removing the differences one by one in order to find the root cause.
Here's the Apache POI-generated file structure:
.
├── [Content_Types].xml
├── _rels
├── docProps
│ ├── app.xml
│ └── core.xml
└── word
├── _rels
│ └── document.xml.rels
├── document.xml
├── footer1.xml
├── header1.xml
├── media
│ └── image1.png
└── settings.xml
Here's the header1.xml
file (stripped a little bit):
<?xml version="1.0" encoding="UTF-8" ?>
<w:hdr xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mo="http://schemas.microsoft.com/office/mac/office/2008/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:mv="urn:schemas-microsoft-com:mac:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
<w:p>
<w:pPr>
<w:jc w:val="left" /></w:pPr>
<w:r>
<w:drawing>
<wp:inline distT="0" distB="0" distL="0" distR="0">
<wp:extent cx="1193800" cy="635000" />
<wp:docPr id="2" name="Picture 2" descr="Generated" />
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="2" name="Generated" />
<pic:cNvPicPr/></pic:nvPicPr>
<pic:blipFill><a:blip r:embed="rId2" />
<a:stretch><a:fillRect/></a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm><a:off x="0" y="0" /><a:ext cx="1193800" cy="635000" /></a:xfrm>
<a:prstGeom prst="rect"><a:avLst/></a:prstGeom>
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
</w:r>
</w:p>
</w:hdr>
The image XML code is generated by hand using the solution found here.
The reference ID is relative to what's inside _rels/document.xml.rels
:
<?xml version="1.0" encoding="UTF-8"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/header" Target="header1.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer" Target="footer1.xml"/>
</Relationships>
So basically insert image rId2
=> media/image1.png
.
Now with the document generated by Word, there are a lot more files generated, but once striped, this is what it looks like:
.
├── [Content_Types].xml
├── _rels
├── docProps
│ ├── app.xml
│ └── core.xml
└── word
├── _rels
│ ├── document.xml.rels
│ └── header1.xml.rels
├── document.xml
├── header1.xml
├── media
│ └── image1.png
└── settings.xml
Same files, except for the _rels/header1.xml.rels
, which contains:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>
</Relationships>
And the only difference is the relation ID, which is rId1
, which is taken from header1.xml.rels
, which points to the image.
Now I'm no expert in OOXML nor Apache POI, but I would like POI to put the header's relations in a separate file. Is it possible?
I also noted that these XML namespaces need to be added for the image to be decoded:
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
Update: code uploaded to https://gist.github.com/BenoitDuffez/b132d45747ef8c2e9e7c
Processing:
- the code calls
Exporter#export
- the
Exporter
class calls through POIExporter
methods such as #openDocument
, #addImage
, etc
- the order of calls is:
createDocument
, startHeader
, createParagraph
, addImage
, setParagraphAlignment
, endParagraph
I attached a full log to the gist.