Root of your problem is, that the text()
in your xpath is part of test for elements to retreive and as it is None for some p
elements, it is not retreived.
The solution is to modify xpath to select all p
elements and then get the text
from it.
import lxml.html as LH
xmlstr = """
<table border="1">
<thead>
<tr>
<td><p>T1</p></td>
<td><p>T2</p></td>
<td><p>T3</p></td>
</tr>
</thead>
<tbody>
<tr>
<td><p>A1</p></td>
<td><p></p></td>
<td><p>A3</p></td>
</tr>
</tbody>
</table>
"""
html_root = LH.fromstring(xmlstr)
eol_table = None
for tbl in html_root.xpath('//table'):
p_elements = tbl.xpath('.//tr/td/p')
eol_table = [p_elm.text for p_elm in p_elements]
print eol_table
This prints:
['T1', 'T2', 'T3', 'A1', None, 'A3']
Alternative for case, where some element has no
(this updated request asked by Nijo and he also came with text_content()
call)
xmlstr = """
<table border="1">
<thead>
<tr>
<td><p>T1</p></td>
<td><p>T2</p></td>
<td><p>T3</p></td>
</tr>
</thead>
<tbody>
<tr>
<td><p>A1</p></td>
<td><p></p></td>
<td></td>
</tr>
</tbody>
</table>
"""
html_root = LH.fromstring(xmlstr)
eol_table = None
for tbl in html_root.xpath('//table'):
td_elements = tbl.xpath('.//tr/td')
eol_table = [td_elm.text_content() for td_elm in td_elements]
print eol_table
what prints
['T1', 'T2', 'T3', 'A1', '', '']
As you see, text_content()
never returns None
but in None
cases returns empty string ''