Question

this code does not print the list of companies as reqiured. It does not reach inside first tag If i write "print 'text' " inside first tag it does not print it. BeautifulSoup is working for a different code written for different sites. Any suggestion why is it not working?

from bs4 import BeautifulSoup
import urllib
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/')
html = request.read()
request.close()
soup = BeautifulSoup(html)
for tags in soup.find_all('div', {'class':'mainContent'}):
    for row in tags.find_all('tr'):
        for column in row.find_all('td'):
            print column.text

No correct solution

OTHER TIPS

I have BeautifulSoup 3 and this seems to work correctly:

import BeautifulSoup as BS
import urllib
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/')
html = request.read()
request.close()
soup = BS.BeautifulSoup(html)

try:
   tags = soup.findAll('div', attrs={'class':'mainContent'})
   print '# tags = ' + str(len(tags))
   for tag in tags:
      try:         
         tables = tag.findAll('table')
         print '# tables = ' + str(len(tables))
         for table in tables:            
            try:
               rows = tag.findAll('tr')
               for row in rows:
                  try:
                     columns = row.findAll('td')
                     for column in columns:
                        print column.text
                  except:
                     e = 1
                  #   print 'Caught error getting td tag under ' + str(row)
                  # This is okay since some rows have <th>, not <td>
            except:
               print 'Caught error getting tr tag under ' + str(table)
      except:
         print 'Caught error getting table tag under ' + str(tag)
except:
   print 'Caught error getting div tag'

I believe you'd need to replace 'findAll' with 'find_all'.

Output looks like this: enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top