Python Web Scraping table returns None

https://stackoverflow.com/questions/23664748

22-07-2023
|

Вопрос

I'm trying to scrape the temperature elements of a table from www.intellicast.com

soup =  BeautifulSoup(urllib2.urlopen('http://www.intellicast.com/Local/History.aspx?location=USTX0057').read())
for row in soup('table',{'id':'dailyClimate'})[0].tbody('tr'):
  tds=row
  print tds

The result: TypeErrorL 'NoneType' object is not callable

When looking the the page source code i can see

<table id = "dailyClimate" class="Container">
  <tbody>
    <tr class="TitlesAvgRecord">
       <td..
    <td>...</td>

So I know there is a tbody as well as a tr element.

If I change .tbody('tr') for .tbody('td') I still get an error so I'm assuming I'm assuming the error is somewhere in calling tbody.

Решение

Your browser inserts a <tbody> element, but the actual source doesn't have that element:

<table id="dailyClimate" class="Container">
  <tr class="TitlesAvgRecord">
    <td style="padding-left:5px;">Date</td>
    <td>Average<br />Low</td>
    <td>Average<br />High</td>
    <td>Record<br />Low</td>
    <td>Record<br />High</td>
    <td>Average<br />Precipitation</td>
    <td>Average<br />Snow</td>
  </tr>

<!-- etc. -->

See Why do browsers insert tbody element into table elements?

You could use the html5lib parser instead (using BeautifulSoup(source, 'html5lib')), which would also insert the element. However, you don't need to search for it, just go straight to the <tr> rows:

for row in soup.find('table', id='dailyClimate').find_all('tr'):

or using a CSS selector:

for row in soup.select('table#dailyClimate tr'):

You'd normally only select the tbody element if there perhaps were more than one or there was a thead or tfooter element you wanted to exclude.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow