
I'm trying to figure out how to go about writing a website monitoring script (cron job in the end) to open up a given URL, check to see if a tag exists, and if the tag does not exist, or doesn't contain the expected data, then to write some to a log file, or to send an e-mail.

The tag would be something like or something relatively similar.

Anyone have any ideas?

Was it helpful?


Your best bet imo is to check out BeautifulSoup. Something like so:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("")
soup = BeautifulSoup(page)

# See the docs on how to search through the soup. I'm not sure what
# you're looking for so my example stops here :)

After that, emailing it or logging it is pretty standard fare.


This is a sample code (untested) that log and send mail:

#!/usr/bin/env python
import logging
import urllib2
import smtplib

#Log config

#Open requested url
url = ""
data = urllib2.urlopen(url)

if check_content(data):
   #Report to log'Content found')
   #Send mail
   send_mail('Content not found')

def check_content(data):
    #Your BeautifulSoup logic here
    return content_found

def send_mail(message_body):
    server = 'localhost'
    recipients = ['']
    sender = ''
    message = 'From: %s \n Subject: script result \n\n %s' % (sender, message_body)
    session = smtplib.SMTP(server)

I would code check_content() function using beautifulSoup

The following (untested) code uses urllib2 to grab the page and re to search it.

import urllib2,StringIO

pageString = urllib2.urlopen('**insert url here**').read()
m ='**insert regex for the tag you want to find here**',pageString)
if m == None:
    #take action for NOT found here
    #take action for found here

The following (untested) code uses pycurl and StringIO to grab the page and re to search it.

import pycurl,re,StringIO

b = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, '**insert url here**')
c.setopt(pycurl.WRITEFUNCTION, b.write)
m ='**insert regex for the tag you want to find here**',b.getvalue())
if m == None:
    #take action for NOT found here
    #take action for found here
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top