Correct way of parsing XMP XML metadata attached to the end of a PDF file? [closed]

https://stackoverflow.com/questions/2334020

22-09-2019
|

Question

I have a PDF with some meta data in XMP XML format attached to the end. What is the correct way of parsing and using this meta data?

At the minute i have a working solution using C99, parsing each character in the file, starting at the beginning and using loops until i reach a tag im after and then recording the contents until i reach the closing tag. I can't see this as the best way of doing things.

I'm now rewriting this program using C# + Mono (not .NET) and i wonder if there is a magic framework class for this task instead of just imitating the C99 version? (Also, i can only rely on third party libraries if they don't contain any p/invoke stuff, etc.)

I'm using Mono because i need this app to be cross-platform.

Solution

Adobe has published the XMP specification. Give it a try. You need to find out what XMP schema the XML uses and parse it accordingly.

OTHER TIPS

If you can get the complete XML as a string, you can use XmlDocument.Load to get the complete XML in memory for querying.

You can then use XPath with the XmlDocument.SelectNodes method in order to get to your data.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow