Question

I'm hoping you can show me where i'm going wrong with my webscraper.

What I would like to do is to be notified when a certain string ("Sorry, Gruen Fan") changes on a page. I'm able to pull in the string, however, the "If" function doesn't seem to work - its output should be "Text is in". Here's the code:

from bs4 import BeautifulSoup
from urllib import urlopen
import re

urls= ["http://www.abc.net.au/tv/programs/gruen-nation/"]

for url in urls:
    webpage = urlopen(url).read()
    FindTitle = re.compile('\t\t\t\t(.*)\.<BR><BR>')
    FindTitle = re.findall(FindTitle,webpage)
    print FindTitle[0]
    print ' '

if 'Sorry, Gruen fan' in FindTitle:
    print("Text is in")
else:
    print("Text isn't in")

Thanks in advance for your time,

Sam.

Was it helpful?

Solution

FindTitle is a list. The string isn't in the list, so you get False.

You should check if it's in the string in the list instead:

if 'Sorry, Gruen fan' in FindTitle[0]:

Also, you don't need regex if you just want to check for a string:

from urllib import urlopen

urls = ["http://www.abc.net.au/tv/programs/gruen-nation/"]

for url in urls:
    html = urlopen(url).read()

    if 'Sorry, Gruen fan' in html:
        print("Text is in")
    else:
        print("Text isn't in")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top