Question

Input file : chars.csv :

4,,x,,2,,9.012,2,,,,
6,,y,,2,,12.01,±4,,,,
7,,z,,2,,14.01,_3,,,,

When I try to parse this file, I get this error even after specifying utf-8 encoding.

>>> f=open('chars.csv',encoding='utf-8')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 36: invalid start byte

How to correct this error?

Version: Python 3.2.3

Was it helpful?

Solution

Your input file is clearly not utf-8 encoded, so you have at least those options:

  • f=open('chars.csv', encoding='utf-8', errors='ignore') if given file is mostly utf-8 and you don't care about some small data loss. For other errors parameter values check manual
  • simply use proper encoding, like latin-1, if you know one

OTHER TIPS

This is not UTF-8 encoding. The UTF-8 encoding of ± is \xC2\xB1 and  is \xC2\x83. As RobertT suggested, try Latin-1:

f=open('chars.csv',encoding='latin-1')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top