First, you need to decode the file contents, not encode them.
Second, the csv
module doesn't like unicode strings in Python 2.7, so having decoded your data you need to convert back to utf-8.
Finally, csv.reader
is passed an iteration over the lines of the file, not a big string with linebreaks in it.
So:
csv.reader(f.read().decode('utf-8-sig').encode('utf-8').splitlines())
However, you might consider it simpler / more efficent just to remove the BOM manually:
def remove_bom(line):
return line[3:] if line.startswith(codecs.BOM_UTF8) else line
csv.reader((remove_bom(line) for line in f), dialect = 'excel', delimiter = ';')
That is subtly different, since it removes a BOM from any line that starts with one, instead of just the first line. If you don't need to keep other BOMs that's OK, otherwise you can fix it with:
def remove_bom_from_first(iterable):
f = iter(iterable)
firstline = next(f, None)
if firstline is not None:
yield remove_bom(firstline)
for line in f:
yield f