Question

I've been asked to write some scripts that read in XML configuration files that make liberal use of XLink to include XML stored in multiple files. For example:

<Environment xlink:href="#{common.environment}" />

(#{common.environment} is a property placeholder that gets resolved first and can be ignored here.) The company has standardized on lxml for advanced XML processing in python.

I've been looking for examples or docs on how to process these occurrences under these restraints and, at a minimum, include their content in the parent XML document as if they were actually inserted at that point. I've been a bit surprised to find precious little out there to the point that I'm wondering if I'm missing something obvious. I've found generic docs on what XLink is and I've found a few example of it being used in the context of XSLT processing. That hasn't been helpful to me though.

Could anyone offer any advice on how to best implement this whether it be docs, examples or just some advice from experience? Thanks.

UPDATE: Here is a before and after example:

Before. This is what is actually in the file being parsed:

<Root>
    <Environment xlink:href="#{common.environment}" />
</Root>

This is what is in the file that #{common.environment} resolves to:

<?xml version="1.0" encoding="UTF-8"?>
<Environment>
    <Property key="hello.world" value="foo" />
    <Property key="bar.baz" value="fred" />
</Environment>

After. This is how the parser "sees" it after all processing is done:

<Root>
    <Environment>
        <Property key="hello.world" value="foo" />
        <Property key="bar.baz" value="fred" />
    </Environment>
</Root>

This is a radically simplified example of what goes on in there.

Was it helpful?

Solution

This answer is likely to be a far cry from what you really need, but perhaps it can be of some help. The small program below is what I could come up with based on the "radically simplified" example.

from lxml import etree

parent = etree.parse("parent.xml").getroot()
penv = parent.xpath("Environment")

for e in penv:
    child = e.get("{http://www.w3.org/1999/xlink}href")
    c = etree.parse(child).getroot()
    parent.replace(e, c)

print etree.tostring(parent)

parent.xml:

<Root xmlns:xlink="http://www.w3.org/1999/xlink">
  <Environment xlink:href="child.xml"/>
</Root>

child.xml:

<Environment>
  <Property key="hello.world" value="foo" />
  <Property key="bar.baz" value="fred" />
</Environment>

When the program is run, it outputs:

<Root xmlns:xlink="http://www.w3.org/1999/xlink">
  <Environment>
  <Property key="hello.world" value="foo"/>
  <Property key="bar.baz" value="fred"/>
</Environment></Root>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top