Question

I have to read from a complex HTML document where a table does not have an ID and each table has undefined number of tr tags. I want to print the text in the td of the last <tr> tag. I could not find anything that prints the last child while parsing the tree.

I want to print 4,4.1,4.2

<table border=0 bgcolor=#000000 cellspacing=1 width="100%"
<tr bgcolor="#FFFFFF">
    <td>1</td>
    <td>1.1</td>
    <td>1.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>2</td>
    <td>2.1</td>
    <td>2.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>3</td>
    <td>3.1</td>
    <td>3.2</td>
</tr>
<tr bgcolor="#FFFFFF">
    <td>4</td>
    <td>4.1</td>
    <td>4.2</td>
</tr>

This is what I have so far:

from bs4 import BeautifulSoup
import urllib
sock = urllib.urlopen("someurl")

htmlread = sock.read()
soup = BeautifulSoup(htmlread)


tabledata = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
other = tabledata.findAll("tr", {"bgcolor":"#FFFFFF"})

print other
Was it helpful?

Solution

It sounds like you're trying to find the last tr element and print all td text values inside it. First, to find the last tr, you can select all tr elements and then use -1 to find the last one:

>>> last_tr = soup('tr')[-1]

Then, to find all <td> tags in that <tr> element:

>>> [td.text for td in last_tr('td')]
[u'4', u'4.1', u'4.2']

OTHER TIPS

Find all td elements inside the last tr element of the table:

table = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
print [td.text for td in table.find_all('tr')[-1].find_all('td')]

Demo:

>>> from bs4 import BeautifulSoup
>>> data = """
... <table border=0 bgcolor=#000000 cellspacing=1 width="100%"
... <tr bgcolor="#FFFFFF">
...     <td>1</td>
...     <td>1.1</td>
...     <td>1.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>2</td>
...     <td>2.1</td>
...     <td>2.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>3</td>
...     <td>3.1</td>
...     <td>3.2</td>
... </tr>
... <tr bgcolor="#FFFFFF">
...     <td>4</td>
...     <td>4.1</td>
...     <td>4.2</td>
... </tr>
... """
>>> soup = BeautifulSoup(data)
>>> table = soup.find("table", {"border":"0", "bgcolor":"#000000", "cellspacing":"1", "width":"100%"})
>>> print [td.text for td in table.find_all('tr')[-1].find_all('td')]
[u'4', u'4.1', u'4.2']

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top