문제

I have to read from a complex HTML document where a table does not have an ID and each table has undefined number of tr tags. I want to print the text in the td of the last <tr> tag. I could not find anything that prints the last child while parsing the tree.

I want to print 4,4.1,4.2

<table border=0 bgcolor=#000000 cellspacing=1 width="100%"
<tr bgcolor="#FFFFFF">
    <td>1</td>
    <td>1.1</td>
    <td>1.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>2</td>
    <td>2.1</td>
    <td>2.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>3</td>
    <td>3.1</td>
    <td>3.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>4</td>
    <td>4.1</td>
    <td>4.2</td>
</tr>

This is what I have so far:

from bs4 import BeautifulSoup
import urllib
sock = urllib.urlopen("someurl")

htmlread = sock.read()
soup = BeautifulSoup(htmlread)


tabledata = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
other = tabledata.findAll("tr", {"bgcolor":"#FFFFFF"})

print other
도움이 되었습니까?

해결책

It sounds like you're trying to find the last tr element and print all td text values inside it. First, to find the last tr, you can select all tr elements and then use -1 to find the last one:

>>> last_tr = soup('tr')[-1]

Then, to find all <td> tags in that <tr> element:

>>> [td.text for td in last_tr('td')]
[u'4', u'4.1', u'4.2']

다른 팁

Find all td elements inside the last tr element of the table:

table = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
print [td.text for td in table.find_all('tr')[-1].find_all('td')]

Demo:

>>> from bs4 import BeautifulSoup
>>> data = """
... <table border=0 bgcolor=#000000 cellspacing=1 width="100%"
... <tr bgcolor="#FFFFFF">
...     <td>1</td>
...     <td>1.1</td>
...     <td>1.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>2</td>
...     <td>2.1</td>
...     <td>2.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>3</td>
...     <td>3.1</td>
...     <td>3.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>4</td>
...     <td>4.1</td>
...     <td>4.2</td>
... </tr>
... """
>>> soup = BeautifulSoup(data)
>>> table = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
>>> print [td.text for td in table.find_all('tr')[-1].find_all('td')]
[u'4', u'4.1', u'4.2']

Hope that helps.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top