Handling error codes:
When you attempt to open a URL and encounter an error, you will get an HTTPError which conveniently includes the HTTP status code, and a reason (e.g. some string). If you want to ignore errors, you can wrap your function in a try / except
block and ignore the error:
try:
page = urllib.request.urlopen(args).read()
# ...
except urllib.error.HTTPError as e:
# we don't care about no stinking errors
# ... but if we did, e.code would have the http status code...
# ... and e.reason would have an explanation of the error (hopefully)
pass
Search a page for a string:
Beautiful soup is incredibly powerful; its find
method (and its find_all
method) supports a keyword argument text
which uses a regular expression to find the text in a page. In your case, since you just need to ensure that the text exists, you can likely get away with just ensuring that a result is returned via the find
method.
if soup.find(text=re.compile('my search string')):
# do something
More details on the text
argument can be found in the documentation.