Question

I have a PDF with some meta data in XMP XML format attached to the end. What is the correct way of parsing and using this meta data?

At the minute i have a working solution using C99, parsing each character in the file, starting at the beginning and using loops until i reach a tag im after and then recording the contents until i reach the closing tag. I can't see this as the best way of doing things.

I'm now rewriting this program using C# + Mono (not .NET) and i wonder if there is a magic framework class for this task instead of just imitating the C99 version? (Also, i can only rely on third party libraries if they don't contain any p/invoke stuff, etc.)

I'm using Mono because i need this app to be cross-platform.

Was it helpful?

Solution

Adobe has published the XMP specification. Give it a try. You need to find out what XMP schema the XML uses and parse it accordingly.

OTHER TIPS

If you can get the complete XML as a string, you can use XmlDocument.Load to get the complete XML in memory for querying.

You can then use XPath with the XmlDocument.SelectNodes method in order to get to your data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top