I have to read from a complex HTML document where a table does not have an ID and each table has undefined number of tr tags. I want to print the text in the td of the last <tr> tag. I could not find anything that prints the last child while parsing the tree.

I want to print 4,4.1,4.2

<table border=0 bgcolor=#000000 cellspacing=1 width="100%"
<tr bgcolor="#FFFFFF">
    <td>1</td>
    <td>1.1</td>
    <td>1.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>2</td>
    <td>2.1</td>
    <td>2.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>3</td>
    <td>3.1</td>
    <td>3.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>4</td>
    <td>4.1</td>
    <td>4.2</td>
</tr>

This is what I have so far:

from bs4 import BeautifulSoup
import urllib
sock = urllib.urlopen("someurl")

htmlread = sock.read()
soup = BeautifulSoup(htmlread)


tabledata = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
other = tabledata.findAll("tr", {"bgcolor":"#FFFFFF"})

print other
有帮助吗?

解决方案

It sounds like you're trying to find the last tr element and print all td text values inside it. First, to find the last tr, you can select all tr elements and then use -1 to find the last one:

>>> last_tr = soup('tr')[-1]

Then, to find all <td> tags in that <tr> element:

>>> [td.text for td in last_tr('td')]
[u'4', u'4.1', u'4.2']

其他提示

Find all td elements inside the last tr element of the table:

table = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
print [td.text for td in table.find_all('tr')[-1].find_all('td')]

Demo:

>>> from bs4 import BeautifulSoup
>>> data = """
... <table border=0 bgcolor=#000000 cellspacing=1 width="100%"
... <tr bgcolor="#FFFFFF">
...     <td>1</td>
...     <td>1.1</td>
...     <td>1.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>2</td>
...     <td>2.1</td>
...     <td>2.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>3</td>
...     <td>3.1</td>
...     <td>3.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>4</td>
...     <td>4.1</td>
...     <td>4.2</td>
... </tr>
... """
>>> soup = BeautifulSoup(data)
>>> table = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
>>> print [td.text for td in table.find_all('tr')[-1].find_all('td')]
[u'4', u'4.1', u'4.2']

Hope that helps.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top