Question

I was initially confused about the behavior or Element.__iter()__ and ElementTree.__iter__(). It was unclear to me whether the iterator does a complete depth first traversal on all elements below it, or whether it only iterates over direct children.

The following test indicates it only iterates over direct children:

>>> import xml.etree.ElementTree as etree
>>> s = "<root><foo><bar></bar></foo><baz></baz></root>"
>>> t = etree.fromstring(s)
>>> for e in t: print(e)
... 
<Element 'foo' at 0x7ff15646a650>
<Element 'baz' at 0x7ff15646a6d0>

Note that <bar> was not included, because <bar> is a child of <foo>, not <root>.

Okay, fine... makes sense.

But the Python docs say:

iter(tag=None)

Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If tag is not None or '*', only elements whose tag equals tag are returned from the iterator. If the tree structure is modified during iteration, the result is undefined.

This wording, especially the phrase "iterates over this element and all elements below it, in document (depth first) order", strongly, strongly implies to me that the documentation is saying that the iterator is supposed to iterate over ALL elements in the sub-tree, not just direct children.

Plus, the documentation also says "Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it" - indicating that the root element is supposed to be the first element returned in the iteration. But in fact, the root element is not returned. Instead, the first child is returned.

So, this documentation seems horribly misleading... and had me very confused. Is the documentation simply wrong here, or am I just not interpreting what it's saying correctly?

Was it helpful?

Solution

The iter that you are referring in the docs is different from the __iter__ method.

To iterate over all the tags in depth-first order, use the iter method as below:

>>> for e in t.iter(): print(e)
... 
<Element 'root' at 0x10b92ccd0>
<Element 'foo' at 0x10b92cd10>
<Element 'bar' at 0x10b92cd50>
<Element 'baz' at 0x10b92ce10>

In contrast, __iter__ will only iterate over direct children.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top