If it's really just the duplicate loops that have got you concerned, you could move f
out of the scope of the try-catch block, then put a single copy of the loop after everything's said and done:
try:
f = bz2.BZ2File(corpus, mode='r')
except IOError:
f = codecs.open(corpus, encoding='utf-8')
for data in parse_lines(f):
yield data
f.close()
Although I'd look into only opening the file once, checking for the BZ2 header (the characters BZ
as the first two bytes), and using that to decide whether to continue reading it as plaintext, or pass the data into a bz2.BZ2Decompressor
instance.