蟒ElementTree的检查节点/元素类型

https://stackoverflow.com/questions/3611513

26-09-2019
|

题

我使用ElementTree的和不能找出如果childnode是文本或没有。 childelement.text似乎工作并没有因为它提供了假阳性甚至不属于文本节点的节点。

任何建议？

实施例

<tr>
  <td><a href="sdas3">something for link</a></td>
  <td>tttttk</td>
  <td><a href="tyty">tyt for link</a></td>
</tr>

解析该XML文件后，我这样做在Python：

for elem_main in container_trs: #elem_main is each tr
    elem0 = elem_main.getchildren()[0] #td[0]
    elem1 = elem_main.getchildren()[1] #td[1]

    elem0 = elem_main.getchildren()[0]
    print elem0.text

    elem1 = elem_main.getchildren()[1]
    print elem1.text

上面的代码不输出elem0.text;它是空白。我看到的elem1.text（即， tttttk ）在输出中。

<强>更新2

我其实建立一个字典。从与每个这样我可以排序的HTML表元素中的文本。我将如何获得信在此代码？

解决方案

如何通过所有的后代节点使用getiterator方法来迭代：

import xml.etree.ElementTree as xee

content='''
<tr>
  <td><a href="sdas3">something for link</a></td>
  <td>tttttk</td>
  <td><a href="tyty">tyt for link</a></td>
</tr>
'''

def text_content(node):
    result=[]
    for elem in node.getiterator():
        text=elem.text
        if text and text.strip():
            result.append(text)
    return result

container_trs=xee.fromstring(content)
adict={}
for elem in container_trs:
    adict[elem]=text_content(elem)
print(adict)
# {<Element td at b767e52c>: ['tttttk'], <Element td at b767e58c>: ['tyt for link'], <Element td at b767e36c>: ['something for link']}

通过for elem_main in container_trs:的子环路cantainer_trs迭代。

在另一方面，通过for elem_main in container_trs.getiterator():本身，及其子，和大儿等的环container_trs iteraters。

其他提示

elem0.text是None，因为文本实际上是在子元件的一部分。只要进入更深一层：

print elem0.getchildren()[0].text

顺便提及，elem0[0].text为同一构造的快捷方式。 - 不需要的GetChildren（）

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow