It turns out the issue was directly the KeyError - in some of the folders, br_id.csv
had zero entries, and was throwing a KeyError because of this. The way I solved it was with try
, like so:
# parse the data you're about to filter with
with open(os.path.join(root, 'br_ids.csv'), 'r') as f:
filters = {(row["r_id"]) for row in csv.DictReader(f, delimiter=',', quotechar='"')}
with open(os.path.join(root, 'bt_ids.csv'), 'w') as out_f:
headers = ["t_id"]
out = csv.DictWriter(out_f, headers, extrasaction='ignore')
out.writeheader()
# go thru your rows and see if the matching(row[r_id]) is
# found in the previously parsed set of filters; if yes, skip the row
with open(os.path.join(root, 't.csv'), 'r') as f:
for row in csv.DictReader(f, delimiter=','):
try:
if (row["r_id"]) in filters:
out.writerow(row)
except KeyError:
continue
In another case I had a if (row["r_id"]) not in filters:
and bypassed this using the same method, except that if it returned a KeyError
, then it went ahead and did out.writerow(row)
.