Question

I was using urllib in python to get stock prices from yahoo finance. Here is my code so far:

import urllib
import re

name = raw_input(">")

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % name)

htmltext = htmlfile.read()

# The problemed area 
regex = '<span id="yfs_l84_%s">(.+?)</span>' % name

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print price

So I enter a value, and the stock price comes out. But so far I can get it to display a price, just a blank [ ]. I hace commented over where I believe the problem is. Any suggestions? Thanks.

Was it helpful?

Solution

You have not escaped the forward slash in your regex. Change your regex from:

<span id="yfs_l84_%s">(.+?)</span>

to

<span id="yfs_l84_goog">(.+?)<\/span>

This will fix your problem assuming you enter the company's listing code as the input to your code. Ex; goog for google.

That said, regex is a bad choice for what you are trying to do. As suggested by others, explore BeautifulSoup which is a Python library for pulling data out of HTML. With BeautifulSoup your code can be as simple as:

from bs4 import BeautifulSoup
import requests

name = raw_input('>')
url = 'http://finance.yahoo.com/q?s={}'.format(name)
r = requests.get(url)
soup = BeautifulSoup(r.text)
data = soup.find('span', attrs={'id':'yfs_l84_'.format(name)})
print data.text

OTHER TIPS

Any reason you can't use pandas? It has good support for financial data scraping and time series analysis.

http://pandas.pydata.org/pandas-docs/stable/remote_data.html

Here's the yahoo example straight from the documentation :

In [1]: import pandas.io.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 01, 27)
In [5]: f=web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]: 
OnOpen               10.17
High               10.28
Low                10.05
Close              10.28
Volume       60855800.00
Adj Close           9.75
Name: 2010-01-04 00:00:00, dtype: float64

The best way to get data from Yahoo Finance using python2 or python3 is by using a POST method.
You can easily test this out using a Rest service like Postman

Open up postman and use Method POST and use this Then you will see the data. Simply re-create this in python

import requests
url="https://query1.finance.yahoo.com/v7/finance/download/GOOG? period1=1519938930&period2=1522354530&interval=1d&events=history&crumb=.tLvYBkGDu3"

response = requests.post(url)
print response.text

I used to get the data using urllib2 but it gives an authorization error now They are probably filtering everything through Rest methods like GET and POST

This guide will show you how to build Yahoo finance queries that will return csvs. Then you can use the csv library to parse them easily.

If you really want to try hacking through the HTML, use BeautifulSoup. HTML can't be parsed easily with regexes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top