Question

I'm quite new to programming in Python.

I want to make an application which will fetch stock prices from google finance. One example is CSCO (Cisco Sytems). I would then use that data to warn the user when the stock reaches a certain value. It also needs to refresh every 30 seconds.

The problem is I dont have a clue how to fetch the data!

Anyone have any ideas?

Was it helpful?

Solution

This module comes courtesy of Corey Goldberg.

Program:

import urllib
import re

def get_quote(symbol):
    base_url = 'http://finance.google.com/finance?q='
    content = urllib.urlopen(base_url + symbol).read()
    m = re.search('id="ref_694653_l".*?>(.*?)<', content)
    if m:
        quote = m.group(1)
    else:
        quote = 'no quote available for: ' + symbol
    return quote

Sample Usage:

import stockquote
print stockquote.get_quote('goog')

Update: Changed the regular expression to match Google Finance's latest format (as of 23-Feb-2011). This demonstrates the main issue when relying upon screen scraping.

OTHER TIPS

As for now (2015), the google finance api is deprecated. But you may use the pypi module googlefinance.

Install googlefinance

$pip install googlefinance

It is easy to get current stock price:

>>> from googlefinance import getQuotes
>>> import json
>>> print json.dumps(getQuotes('AAPL'), indent=2)
[
  {
    "Index": "NASDAQ", 
    "LastTradeWithCurrency": "129.09", 
    "LastTradeDateTime": "2015-03-02T16:04:29Z", 
    "LastTradePrice": "129.09", 
    "Yield": "1.46", 
    "LastTradeTime": "4:04PM EST", 
    "LastTradeDateTimeLong": "Mar 2, 4:04PM EST", 
    "Dividend": "0.47", 
    "StockSymbol": "AAPL", 
    "ID": "22144"
  }
]

Google finance is a source that provides real-time stock data. There are also other APIs from yahoo, such as yahoo-finance, but they are delayed by 15min for NYSE and NASDAQ stocks.

import urllib
import re

def get_quote(symbol):
    base_url = 'http://finance.google.com/finance?q='
    content = urllib.urlopen(base_url + symbol).read()
    m = re.search('id="ref_(.*?)">(.*?)<', content)
    if m:
        quote = m.group(2)
    else:
        quote = 'no quote available for: ' + symbol
    return quote

I find that if you use ref_(.*?) and use m.group(2) you will get a better result as the reference id changes from stock to stock.

I suggest using the HTMLParser to get the value of the meta tags google places in it's html

<meta itemprop="name"
        content="Cerner Corporation" />
<meta itemprop="url"
        content="https://www.google.com/finance?cid=92421" />
<meta itemprop="imageUrl"
        content="https://www.google.com/finance/chart?cht=g&q=NASDAQ:CERN&tkr=1&p=1d&enddatetime=2014-04-09T12:47:31Z" />
<meta itemprop="tickerSymbol"
        content="CERN" />
<meta itemprop="exchange"
        content="NASDAQ" />
<meta itemprop="exchangeTimezone"
        content="America/New_York" />
<meta itemprop="price"
        content="54.66" />
<meta itemprop="priceChange"
        content="+0.36" />
<meta itemprop="priceChangePercent"
        content="0.66" />
<meta itemprop="quoteTime"
        content="2014-04-09T12:47:31Z" />
<meta itemprop="dataSource"
        content="NASDAQ real-time data" />
<meta itemprop="dataSourceDisclaimerUrl"
        content="//www.google.com/help/stock_disclaimer.html#realtime" />
<meta itemprop="priceCurrency"
        content="USD" />

With code like this:

import urllib
try:
    from html.parser import HTMLParser
except:
    from HTMLParser import HTMLParser

class QuoteData:
    pass

class GoogleFinanceParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.quote = QuoteData()
        self.quote.price = -1

    def handle_starttag(self, tag, attrs):
        if tag == "meta":
            last_itemprop = ""
            for attr, value in attrs:
                if attr == "itemprop":
                    last_itemprop = value

                if attr == "content" and last_itemprop == "name":
                    self.quote.name = value
                if attr == "content" and last_itemprop == "price":
                    self.quote.price = value
                if attr == "content" and last_itemprop == "priceCurrency":
                    self.quote.priceCurrency = value
                if attr == "content" and last_itemprop == "priceChange":
                    self.quote.priceChange = value
                if attr == "content" and last_itemprop == "priceChangePercent":
                    self.quote.priceChangePercent = value
                if attr == "content" and last_itemprop == "quoteTime":
                    self.quote.quoteTime = value
                if attr == "content" and last_itemprop == "exchange":
                    self.quote.exchange = value
                if attr == "content" and last_itemprop == "exchangeTimezone":
                    self.quote.exchangeTimezone = value


def getquote(symbol):
    url = "http://finance.google.com/finance?q=%s" % symbol
    content = urllib.urlopen(url).read()

    gfp = GoogleFinanceParser()
    gfp.feed(content)
    return gfp.quote;


quote = getquote('CSCO')
print quote.name, quote.price

Just in case you want to pull data from Yahoo... Here is a simple function. This does not scrape data off a normal page. I thought I had a link to the page describing this in the comments, but I do not see it now - there is a magic string appended to the URL to request specific fields.

import urllib as u
import string
symbols = 'amd ibm gm kft'.split()

def get_data():
    data = []
    url = 'http://finance.yahoo.com/d/quotes.csv?s='
    for s in symbols:
        url += s+"+"
    url = url[0:-1]
    url += "&f=sb3b2l1l"
    f = u.urlopen(url,proxies = {})
    rows = f.readlines()
    for r in rows:
        values = [x for x in r.split(',')]
        symbol = values[0][1:-1]
        bid = string.atof(values[1])
        ask = string.atof(values[2])
        last = string.atof(values[3])
        data.append([symbol,bid,ask,last,values[4]])
    return data

Here, I found the link that describes the magic string: http://cliffngan.net/a/13

http://docs.python.org/library/urllib.html for fetching arbitrary URLs.

Apart from that you should better look a some web service providing the data in JSON format.

Otherwise you have to implement parsing etc. on your own.

Screenscrapping yahoo.com for getting the stocks is unlikely the right road to success.

You can start by looking at the Google Finance APIs, although I don't see a Python API or wrapper. It looks like the only options for accessing the data directly are Java and JavaScript. You can also use cURL if you're familiar with it and it's available on your system.

Another good place to start is Google Finance's own API: http://code.google.com/apis/finance/ You can look at their finance gadgets for some example code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top