Question

I am writing a code to parse through a bunch of xml files. It basically looks like this:

for i in range(0, 20855):
    urlb = str(i)
    url = urla + urlb
    trys=0
    t=0
    while (trys < 3):
        try:
            cfile = UR.urlopen(url)
            trys = 3
        except urllib.error.HTTPError as e:
            t=t+1
            print('error at '+str(time.time()-tstart)+' seconds')
            print('typeID = '+str(i))
            print(e.code)
            print(e.read())
            time.sleep (0.1)
            trys=0+t
    tree = ET.parse(cfile)   ##parse xml file
    root = tree.getroot()
    ...do a bunch of stuff with i and the file data

I'm having a problem with some of the urls I'm calling not actually containing an xml file which breaks my code. I have a list of all the actual numbers that I use instead of the range shown but i really don't want to go through all 21000 and remove each number that fails. Is there an easier way to get around this? I get an error from the while loop (which i have to deal with timeouts really) that looks like this:

b'A non-marketable type was given'
error at 4.321678161621094 seconds
typeID = 31
400

So I was thinking there has to be a good way to bail out of that iteration of the for-loop if my while-loop returns three errors but i can't use break. Maybe an if/else-loop under the while-loop that just passes if the t variable is 3?

Was it helpful?

Solution

You might try this:

for i in range(0, 20855):
    url = '%s%d' % (urla, i)
    for trys in range(3):
        try:
            cfile = UR.urlopen(url)
            break
        except urllib.error.HTTPError as e:
            print('error at %s seconds' % (time.time()-tstart))
            print('typeID = %i'%i)
            print(e.code)
            print(e.read())
            time.sleep(0.1)
    else:
        print "retry failed 3 times"
        continue
    try:
        tree = ET.parse(cfile)   ##parse xml file
    except Exception, e:
        print "cannot read xml"
        print e
        continue
    root = tree.getroot()
    ...do a bunch of stuff with i and the file data

OTHER TIPS

Regarding your "algorithmic" problem: You can always set an error state (as simple as e.g. last_iteration_successful = False) in the while body, then break out of the while body, then check the error state in the for body, and conditionally break out of the for body, too.

Regarding architecture: Prepare your code for all relevant errors that might occur, via proper exception handling with try/except blocks. It might also make sense to define custom Exception types, and then raise them manually. Raising an exception immediately interrupts the current control flow, it could save many breaks.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top