Question

I'm running python 3.3 on windows. The code below goes to yahoo finance and pulls the stock price and prints it. The problem I'm running into is that it outputs:

['540.04']

I just want the number so I can turn it into a float and use it with formulas. I tried just using the float function, but that didn't work. I think I have to somehow remove the brackets and apostrophes with some line of code.

    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    import re

    htmlfile = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")

    Thefind = re.compile ('<span id="yfs_l84_aapl">(.+?)</span>')

    msg=htmlfile.read()

    price = Thefind.findall(str(msg))

    print (price)
Was it helpful?

Solution

The beautiful thing about BeautifulSoup is that you don't have to use regexp to parse HTML data.

This is the correct way of using BS:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)
my_span = soup.find('span', {'id': 'yfs_l84_aapl'})
print(my_span.text)

Which yields

540.04

OTHER TIPS

The function findall() returns a list. If you just want the first group, pick it like this:

Thefind.findall(msg)[0]

But referring to any group is done cleaner like this:

Thefind.match(msg).group(1)

Note: group(0) is the whole match, not the first group.

Use Python built-in functions float(price.strip("[']"))

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top