Question

I'm trying to write code so that it scrap stock symbols data into a csv file. However, I get the following error.

Traceback (most recent call last):
  File "company_data_v3.py", line 23, in <module>
    page = urllib2.urlopen("http://finance.yahoo.com/q/ks?s="+newsymbolslist[i] +"%20Key%20Statistics").read()
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

I have tried this suggestion but it has not worked which imports urlib2 HTTPError into the program. (It seems redundant to do that since I already have the module imported.

The symbols.txt file has stock symbols. Here is the code that I am using:

import urllib2
from BeautifulSoup import BeautifulSoup
import csv
import re
import urllib
from urllib2 import HTTPError
# import modules

symbolfile = open("symbols.txt")
symbolslist = symbolfile.read()
newsymbolslist = symbolslist.split("\n")

i = 0

f = csv.writer(open("pe_ratio.csv","wb"))
# short cut to write

f.writerow(["Name","PE","Revenue % Quarterly","ROA% YOY","Operating Cashflow","Debt to Equity"])
#first write row statement

# define name_company as the following
while i<len(newsymbolslist):
    page = urllib2.urlopen("http://finance.yahoo.com/q/ks?s="+newsymbolslist[i] +"%20Key%20Statistics").read()
    soup = BeautifulSoup(page)
    name_company = soup.findAll("div", {"class" : "title"}) 
    for name in name_company: #add multiple iterations?        
        all_data = soup.findAll('td', "yfnc_tabledata1")
        stock_name = name.find('h2').string #find company's name in name_company with h2 tag
        try:    
            f.writerow([stock_name, all_data[2].getText(),all_data[17].getText(),all_data[13].getText(), all_data[29].getText(),all_data[26].getText()]) #write down PE data
        except (IndexError, urllib2.HTTPError) as e:
            pass
        i+=1    

Do I need to define the error more specifically? Thanks for your help.

Was it helpful?

Solution

You are catching the exception in the wrong location. The urlopen() call throws the exception, as shown by the first lines of your traceback:

Traceback (most recent call last):
  File "company_data_v3.py", line 23, in <module>
    page = urllib2.urlopen("http://finance.yahoo.com/q/ks?s="+newsymbolslist[i] +"%20Key%20Statistics").read()

Catch it there:

while i<len(newsymbolslist):
    try:
        page = urllib2.urlopen("http://finance.yahoo.com/q/ks?s="+newsymbolslist[i] +"%20Key%20Statistics").read()
    except urllib2.HTTPError:
        continue
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top