質問

I'm trying to process an XML file using Python & xml.etree.ElementTree, and having a problem with multiple "hierarchical" default namespaces. What I need to do is change the content of some of the nodes' text fields, then save the file in the identical format.

Maybe an example file will help make it clear...

This is what my code looks like:

from xml.etree import ElementTree

ElementTree.register_namespace('pplv', 'whatever')
ElementTree.register_namespace('', 'blah') # Register the default namespace
parse_tree = ElementTree.parse(infile)

for node in parse_tree.iter():
    if node.tag == '...':
        node.text = '...'
    if ...

    parse_tree.write(outfile)

This is what my source file looks like

<?xml version="1.0" encoding="UTF-8"?>
<pplv:PPLVDocument xmlns:pplv="whatever">
  <pplv:node1>...</pplv:node1>
  <pplv:node2>...</pplv:node2>
  <pplv:node3 xmlns="blah">
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node3>
  <pplv:node4 xmlns="blah2">
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node4>
  <pplv:node5 xmlns="blah3">
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node5>
</pplv:PPLVDocument>

When I parse it using ElementTree, registering the namespaces, I get:

<?xml version="1.0" encoding="UTF-8"?>
<pplv:PPLVDocument xmlns:pplv="whatever" xmlns="blah" xmlns:ns0="blah2" xmlns:ns1="blah3">
  <pplv:node1>...</pplv:node1>
  <pplv:node2>...</pplv:node2>
  <pplv:node3>
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node3>
  <pplv:node4>
    <ns0:node1>...</ns0:node1>
    <ns0:node2>...</ns0:node2>
  </pplv:node4>
  <pplv:node5>
    <ns1:node1>...</ns1:node1>
    <ns1:node2>...</ns1:node2>
  </pplv:node5>
</pplv:PPLVDocument>

As you can see, all the name space definitions have been "rolled up" into a single node. In my original document, the default namespace keeps getting redefined ("blah", "blah1", "blah2"). While I can define a single default namespace ("blah"), in this case there's multiple default namespaces defined in the source document at different points; ElementTree doesn't seem to have a way of letting me save the altered file in this "shape".

As you can probably guess, the (off-the-shelf) code that consumes these files won't accept the files I'm creating, but works with the original file structure just fine.

Happy to switch to lxml if that's going to give me a way to resolve this; I just need a fix!

Thanks in advance

役に立ちましたか?

解決

using lxml:

>>> parser = etree.XMLParser(remove_blank_text=True)
>>> root = etree.parse('in.xml', parser)
>>> root.xpath('//pplv:node2/text()', namespaces={'pplv': 'whatever'})
['...']
>>> root.write('out.xml', pretty_print=True)

$ cat out.xml 
<pplv:PPLVDocument xmlns:pplv="whatever">
  <pplv:node1>...</pplv:node1>
  <pplv:node2>...</pplv:node2>
  <pplv:node3 xmlns="blah">
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node3>
  <pplv:node4 xmlns="blah2">
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node4>
  <pplv:node5 xmlns="blah3">
    <node1>...</node1>
    <node2>...</node2>
  </pplv:node5>
</pplv:PPLVDocument>
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top