Question

I must be doing something inherently wrong here, every example I've seen and search for on SO seems to suggest this would work.

I'm trying to use an XPath search with lxml etree library to parse a garmin tcx file:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">

  <Workouts>
    <Workout Sport="Biking">
      <Name>3P2 WK16 - 3</Name>
      <Step xsi:type="Step_t">
        <StepId>1</StepId>
        <Name>[MP19]6:28-6:38</Name>
        <Duration xsi:type="Distance_t">
          <Meters>13000</Meters>
        </Duration>
        <Intensity>Active</Intensity>
        <Target xsi:type="Speed_t">
          <SpeedZone xsi:type="PredefinedSpeedZone_t">
            <Number>2</Number>
          </SpeedZone>
        </Target>
      </Step>
     ......
     </Workout>
</Workouts>
</TrainingCenterDatabase>

I'd like to return the SpeedZone Element only where the type is PredefinedSpeedZone_t. I thought I'd be able to do:

root = ET.parse(open('file.tcx'))
xsi = {'xsi': 'http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2'}
    for speed_zone in root.xpath(".//xsi:SpeedZone[@xsi:type='PredefinedSpeedZone_t']", namespaces=xsi):
        print speed_zone

Though this doesn't seem to be the case. I've tried lots of combinations of removing/adding namespaces and to no avail. If I remove the attribute search and leave it as ".//xsi:SpeedZone" then this does return:

<Element {http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}SpeedZone at 0x2595188>

as I'd expect.

I guess I could do it inside the for loop but it just feels like it should be possible on one line!

Was it helpful?

Solution

I'm a bit late, but the other answers are confusing IMHO.

In the Python code in the question and in the two other answers, the xsi prefix is bound to the http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 URI. But in the XML document with the Garmin data, xsi is bound to http://www.w3.org/2001/XMLSchema-instance.

Since there are two namespaces at play here, I think the following code gives a clearer picture of what's going on. The namespace associated with the tcd prefix is the default namespace.

from lxml import etree

NSMAP = {"tcd": "http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2",
         "xsi": "http://www.w3.org/2001/XMLSchema-instance"}

root = etree.parse('file.tcx')

for speed_zone in root.xpath(".//tcd:SpeedZone[@xsi:type='PredefinedSpeedZone_t']",
                             namespaces=NSMAP):
    print speed_zone

Output:

<Element {http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}SpeedZone at 0x25b7e18>

OTHER TIPS

One way to workaround this is to avoid specifying the attribute name and use *:

.//xsi:SpeedZone[@*='PredefinedSpeedZone_t']

Another option (not that awesome as previous one) is to actually get all the SpeedZone tags and check for the attribute value in the loop:

attribute_name = '{%s}type' % root.nsmap['xsi']
for speed_zone in root.xpath(".//xsi:SpeedZone", namespaces=xsi):
    if speed_zone.attrib.get(attribute_name) == 'PredefinedSpeedZone_t':
        print speed_zone

Hope that helps.

If all else fails you can still use

".//xsi:SpeedZone[@*[name() = 'xsi:type' and . = 'PredefinedSpeedZone_t']]"

Using name() is not as nice as directly addressing the namespaced attribute, but at least etree understands it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top