Question

I have a datafile in a format similar to

<!-- mydata.xml -->
<alldata>
    <data id="first">
        <coord><x>0</x><y>5</y></coord>
        <coord><x>1</x><y>4</y></coord>
        <coord><x>2</x><y>3</y></coord>
    </data>
    <data id="second">
        <coord><x>0</x><y>2</y></coord>
        <coord><x>1</x><y>1</y></coord>
        <coord><x>2</x><y>0</y></coord>
    </data>
</alldata>

As the x values are the same in all my datasets in the xml-files, I would like to extract the data to a csv-format like

x;first y;second y
0;5;2
1;4;1
2;3;0

Naively, I've tried to match <coord> in the first <data> element and use position() to extract the correct <y> from the next <coord>'s in the <data> element with the second id attribute:

xml sel -T -t -m "/alldata/data[@id='first']/coord" -v "concat(x,';',y,';',../../data[@id='second']/coord[position()]/y,';',position())" -n mydata.xml

This outputs the <y> from the first <coord> in all lines in the output even though the position() is incremented in each line:

0;5;2;1
1;4;2;2
2;3;2;3

How can I achieve what I set out to do?

Was it helpful?

Solution

position() gives the context's position (where it's called from). ../../data[@id='second']/coord[position()] actually means "every coord under the second data which is in it's own position" (which is all of them but XPath 1.0 string conversion only takes the first one).

To refer to the coord you're looping on, you can use the XSLT function current(). This doesn't work with position() for some good reason that I can't think of right now, but you can count() the preceding-sibling nodes instead:

xml sel -T -t -m "/alldata/data[@id='first']/coord" -v "concat(x,';',y,';',../../data[@id='second']/coord[count(current()/preceding-sibling::*)+1]/y)" -n mydata.xml
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top