Question

I'm trying to parse the output of a command line tool. It outputs XML directly to STDOU and I want to parse it.

  • The tool outputs a full XML document like the following:

enter image description here

My goal is to parse that output and only the the string between the <date> tag, but since the document might contain another <date> tags, it must check only the the <date> that follows <key>SULastCheckTime</key>. (And that is a messy situation with new line/spaces there).

Currently I'm solving this situation with the following command:

tool... | grep -A1 '<key>SULastCheckTime</key>' | grep 'string.$' | sed -e 's,.*<date>\([^<]*\)</date>.*,\1,g'

It works fine but it's very messy as you can see and I can't write anything better? Can you help me making it better?

Thank you!

PS: Since I'm doing this in OSX, I don't have the new GNU grepoptions. Btw, by bash version is 3.2.48(1). And... I can't afford to install other tools to parse XML in a better way.

Was it helpful?

Solution

Maybe something like this?

$ cat foo.input
foo
 foo
    <key>some key</key>
    <date>some date</date>
bar
 bar
    <key>SULastCheckTime</key>
    <date>2013-08-10T00:27:40Z</date>
quux
 quux

 

$ awk '/<key>SULastCheckTime<\/key>/ { toggle=1 } toggle && /<date>.*<\/date>/ { gsub(/<[^>]*>/, "", $1); print; exit }' foo.input
2013-08-10T00:27:40Z
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top