Question

I want to parse the html file as :
1)when there is child mode in td,please output stage1
2)when there is no child mode in td ,please output stage2

how to finish my code?

data='''
<table>
<tr>
<td>
<span>  hallo
</span>
</td>
</tr>
<tr>
<td>  hallo
</td>
</tr>
</table> '''
import lxml.html
root=lxml.html.document_fromstring(data)
set=root.xpath('//table//tr//td')
for cell in set:
    if(there is a child node in current node):
        print("stage1")
    else:
        print("stage2")
Was it helpful?

Solution

One option is to use getchildren() method:

for cell in set:
    print "stage1" if cell.getchildren() else "stage2"

prints:

stage1
stage2

Since the first td has span inside, the second td doesn't have any children.

UPD:

for cell in set:
    children = cell.getchildren()
    if not children:
        print "stage2"
    else:
        print "stage1"
        for child in children:
            print child.xpath('node()')[0].strip()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top