Question

Quite regularly I need to extract key value pairs from XML files. Is there an easy-to-use UNIX(-style) command line tool available for this?

Example case

The XML file looks like this:

<array>
    <dict>
        <key>BlackPointCompensation</key>
        <false/>
        <key>Name</key>
        <string>Export A</string>
        <key>WatermarkSettings</key>
        <dict>
            <key>DGOperationClassName</key>
            <string>DGImageCompositeScaledOperation</string>
            <key>inputKeys</key>
            <dict>
                <key>inputCompositeImagePath</key>
                <string>/Users/me/imageA.psd</string>
                <key>inputOpacity</key>
                <real>0.94999999999999996</real>
            </dict>
        </dict>
    </dict>
    <dict>
        <key>BlackPointCompensation</key>
        <false/>
        <key>Name</key>
        <string>Export B</string>
        <key>WatermarkSettings</key>
        <dict>
            <key>DGOperationClassName</key>
            <string>DGImageCompositeScaledOperation</string>
            <key>inputKeys</key>
            <dict>
                <key>inputCompositeImagePath</key>
                <string>/Users/me/imageB.psd</string>
                <key>inputOpacity</key>
                <real>0.70</real>
            </dict>
        </dict>
    </dict>
</array>

For this file I want to construct a command line command with the ("key" tag) parameter "inputCompositeImagePath" which prints the ("string" tag) values /Users/me/imageA.psd and /Users/me/imageB.psd.

What's a good tool for this kind of operations? I had a brief look at xmllint, but it doesn't seem to be well suited for this use case.

Was it helpful?

Solution

Through another stackoverflow answer (Extracting values from XML file...,) I came a cross the command line tool xmlstarlet (Download at xmlstar.sourceforge.net, tutorial at www.geekfarm.org) which is able to do the job:

Say, the XML file is saved as preset preset-test.xml. Then use in your shell

xml el preset-test.xml

to get an overview over the XML structure. Then type

xml sel -t -c "(//string[preceding-sibling::key[1] = 'inputCompositeImagePath'])/text()" preset-test.xml
# (blank line for firefox rendering of stackoverflow code class)

to extract the wanted information. It prints as expected:

/Users/me/imageA.psd
/Users/me/imageB.psd

Note: The magic is obviously in the XPath expression (//string[preceding-sibling::key[1] = 'inputCompositeImagePath'])/text(). It works as follows:

  1. //string[...] selects all tags with the name string, but

  2. preceding-sibling::key[1] = 'inputCompositeImagePath' restricts this selection to tags where the first preceding sibling has the value 'inputCompositeImagePath'.

  3. /text() than selects from this selection the textual content.

The xmlstarlet tool is maybe not as easy to use as I hoped for, but due to its full XPath functionality pretty powerful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top