XMP metadata reference inside PDF

https://stackoverflow.com/questions/17295880

01-06-2022
|

Question

I'm planning to develop an application to work with PDF metadata without any extern library, just accessing myself to the raw PDF.

I've understood info dictionaries and how they are referenced by the /Info tag in the trailer. However, taking a look inside a PDF file with an hexadecimal editor I did not find any reference to the XMP object; it exists, but it is not referenced. (When I say "XMP metadata" I mean the whole file's metadata not individual objects' metadata).

So, my question is: How is XMP metadata referenced inside a PDF file? How can an extern application retrieve XMP metadata if it is not referenced?

I suppose that if it is not referenced it must be placed in some fixed location inside the file, but I'm not sure about it.

Thanks in advance.

Solution

You can find all information about XMP here: http://www.adobe.com/devnet/xmp.html

But the document on how to embed XMP in PDF refers to the PDF specification as authoritative. This specification states that the XMP metadata shall be embedded in a metadata stream and that the document metadata packet shall be referenced by a key called "Metadata" from the document catalog.

That being said, XMP was specifically designed in order to be able to be found and read (and sometimes updated) without understanding the file format it is embedded in; it contains a magic fingerprint sequence at its start for exactly that purpose.

(You'll find the PDF specification on the Adobe developer web site as well, even though the latest version is in fact now an ISO standard - ISO 32000 to be specific)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow