Correct way of parsing XMP XML metadata attached to the end of a PDF file? [closed]
Question
I have a PDF with some meta data in XMP XML format attached to the end. What is the correct way of parsing and using this meta data?
At the minute i have a working solution using C99, parsing each character in the file, starting at the beginning and using loops until i reach a tag im after and then recording the contents until i reach the closing tag. I can't see this as the best way of doing things.
I'm now rewriting this program using C# + Mono (not .NET) and i wonder if there is a magic framework class for this task instead of just imitating the C99 version? (Also, i can only rely on third party libraries if they don't contain any p/invoke stuff, etc.)
I'm using Mono because i need this app to be cross-platform.
Solution
Adobe has published the XMP specification. Give it a try. You need to find out what XMP schema the XML uses and parse it accordingly.
OTHER TIPS
If you can get the complete XML as a string, you can use XmlDocument.Load
to get the complete XML in memory for querying.
You can then use XPath with the XmlDocument.SelectNodes
method in order to get to your data.