Extracting embedded XML File from PDF A/3 using abcpdf in C# - ZUGFeRD

Question 1

I don't know abcpdf but I guess that the pdf libs offer similar access to the pdfs content.

First take a look at Das-ZUGFeRD-Format_1p0.pdf. Especially page 112. The images shows the object tree you have to traverse in order to find the xml stream.

With this tree you have the names, the types and the direction. Now you can traverse the pdf object tree to get to the XML content that you are looking for.

The steps based on the diagram.

Read your PDF
Get the catalog inside your PDF
Get the Array with name AF from Catalog
Get first element from AF array (should be file spec)
From file spec get the dictionary named EF
Get the stream content of EF

This are the steps you need to perform in order to get to the content.

To display the structure of a pdf and browse the tree I would recommend to use a tool like iText RUPS

Question 2

What did i do with abcpdf:

Get the Objectsoup Array from the Doc (Pretty much an array of all Objects in the Doc)
as ZUGFeRD allows only one embedded file inside the PDF, i just searched this objectsoup-array for the one of the type StreamObject that contains /EmbeddedFile
Decompress the Stream of that object, get the byte[] of the stream and write it into an xml file