i got the below code from the other thread,

from bs4 import BeautifulSoup
import urllib2

url = 'http://finance.yahoo.com/q/op?s=aapl+Options'
htmltext = urllib2.urlopen(url).read()
soup = BeautifulSoup(htmltext)

#Table 8 has the data needed; it is nested under other tables though
# specific reference works as below:
print soup.findAll('table')[8].findAll('tr')[2].findAll('td')[2].contents

My question here is why i got the error after i change the ('table')[8] to ('table')[7] or others index that lower than 8? How do i know which index i should use? I don't get the concept...

Thanks for the advice...

Update:

I'm trying extract the information from a webpage... There is few in a , i just need to extract some data in certain ... that's why i am figuring out which index i should use in my case...

In fact, there is only one table in my html, so i tried to use findAll('table')[0].. is it correct?

有帮助吗?

解决方案

That page has outer tables with the class yfnc_datamodoutline1, containing a child table with the actual data. You could use a CSS selector to list each one and extract data:

datatables = soup.select('table.yfnc_datamodoutline1 table')

This too returns a list you can index, but the number of results is far more manageable and it is clearer what tables you retrieved here. There are only 2 results here now; pick one (Call or Put options), then parse the rows:

for row in datatables[1].find_all('tr'):
    if row.th:
        # header row
    else:
        # data row

A quick demo printing out the second table:

>>> for row in soup.select('table.yfnc_datamodoutline1 table')[1].find_all('tr'):
...     if row.th:
...         print ' '.join(header.get_text() for header in row.find_all('th'))
...     else:
...         print ' '.join(cell.get_text() for cell in row.find_all('td'))
... 
Strike Symbol Last Chg Bid Ask Vol Open Int
280.00 AAPL140517P00280000 0.05  0.00 N/A 0.12 2 6
290.00 AAPL140517P00290000 0.02  0.00 N/A 0.11 11 11
295.00 AAPL140517P00295000 0.01  0.00 N/A 0.08 3 8
300.00 AAPL140517P00300000 0.05  0.00 N/A 0.09 1 23
305.00 AAPL140517P00305000 0.05  0.00 N/A 0.10 10 20
# ... etc.
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top