Using a regular expression against an XML document is never a good idea (seriously).
You can achieve the desired result without any regular expressions by extracting the date from the relevant XML element (I've used lxml.etree
instead of xml.etree.ElementTree
, but the principle will be the same).
Also, I've added an easier way to generate a 3-digit number (leading 0 if necessary).
import urllib2, datetime, sys, string
import lxml.etree
for i in range(414,416):
num_string = '{:03d}'.format(i)
print num_string, datetime.datetime.now()
url = "http://clerk.house.gov/evs/2001/roll"+num_string+".xml"
xml = lxml.etree.parse(urllib2.urlopen(url))
root = xml.getroot()
actdate = root.xpath('//action-date')[0]
dte = actdate.text.strip()
print dte
If you insist on using a regular expression, then [0-9]+-[A-Za-z]+-[0-9]+
would be better as it guarantees at least one digit followed by dash followed by at least one letter followed by dash followed by at least one digit (as holdenweb mentions in his comment).