Question

I've been having a terrible time finding any examples of XSTL processing with the python libxml2 library and XSLT. I have a set of legacy documents with a default namespace, and I've been trying convert them into something I can import into a tinkerpop-compliant database. The legacy data has a default namespace, and I can't figure out how to convince libxslt to find anything in the data.

As you can see from my examples, I can't seem to get anything from an inner template to render at all. It does seem to find the topmost (cmap) template, as it spits out the <graphml> boilerplate. I am fairly new to XSLT, so this may be just a shortcoming, but nobody on SO or the google seems to have any examples of this working.

I've thought about just ripping the offending default namespace out with a regexp, but parsing XML with a regexp is usually a bad plan, and it just seems like the wrong idea.

I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
  <cmap xmlns="http://cmap.ihmc.us/xml/cmap/">
    <map width="1940" height="3701">
      <concept-list>
        <concept id="1JNW5YSZP-14KK308-5VS2" label="Solving Linear&#xa;Systems by&#xa;Elimination&#xa;[MAT.ALG.510]"/>
        <concept id="1JNW55K3S-27XNMQ0-5T80" label="Using&#xa;Inequalities&#xa;[MAT.ALG.423]"/>
      </concept-list
    </map>
  </cmap>

There's much more, but this is a sample of it. I was able, using the xpathRegisterNS() command, to register the default namespace and find my map, concept-map, etc with it. I have not had the same luck when trying to process this with libxslt.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:c="http://cmap.ihmc.us/xml/cmap/">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="c:cmap">
    <graphml xmlns="http://graphml.graphdrawing.org/xmlns">
      <xsl:apply-templates select="c:concept"/>
    </graphml>      
  </xsl:template>
  <xsl:template match="c:concept">
    <node> Found a node </node>
  </xsl:template>
</xsl:stylesheet>

And the python experiment is just:

 import libxml2
 import libxslt
 styledoc = libxml2.parseFile("cxltographml.xsl")
 style = libxslt.parseStylesheetDoc(styledoc)
 doc = libxml2.parseFile("algebra.cxl")
 result = style.applyStylesheet(doc, None)
 print style.saveResultToString(result)
Was it helpful?

Solution

You've got the right technique regarding namespaces in the xslt, namely you must map the uri to a prefix as the "default namespace" doesn't apply to xpaths or template match expressions. The problem is that in your c:cmap template you're doing

  <xsl:apply-templates select="c:concept"/>

But the cmap element doesn't have any direct children named concept. Try

  <xsl:apply-templates select="c:map/c:concept-list/c:concept"/>

or more generally (but potentially less efficient)

  <xsl:apply-templates select=".//c:concept"/>

to find all descendant concept elements rather than just immediate children.

Also, in the c:concept template you will need to add xmlns="http://graphml.graphdrawing.org/xmlns" to the <node> element otherwise it will be output in no namespace (with xmlns="").

OTHER TIPS

I've been having a terrible time finding any examples of XSTL processing

Perhaps because you spelt it wrong? (sorry, but we all make silly mistakes and one shouldn't rule them out...)

Actually (excuse me for trying to do some introspection on why this problem was hard to solve), I suspect that because so many people have trouble with default namespaces, you were somehow fixated on this as the cause, and failed to pursue other possibilities.

Also, you seem to have exercised a suspicion that the problem lay with libxslt. It might be good to get into the habit of trying your code with a different XSLT processor, so that you can put your mind at rest and eliminate processor bugs as a possible cause.

Generally, when you've got as far as identifying that a path expression is failing to select something, there are several approaches to diagnosis: (a) stare at the expression until you see what's wrong, (b) simplify the expression, eg. by removing filters, until you identify which part of it is wrong, or (c) move to using schema-awareness and XSLT 2.0. (Generally, sadly, (c) is too much effort for most people, and (b) is no use for very simple expressions, so they continue to waste time doing (a)).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top