Question

I am using chardet to detect encoding of text files including Italian. The problem is it consistently detects their encoding as iso-8859-2 while the correct detection would be iso-8859-1. Does anybody know a fix? My local language is set to Polish? Could that influence the detection?

Was it helpful?

Solution

chardet doesn't support iso-8859-1, that's why it's not detecting it. For supported character encodings, see chardets homepage - http://pypi.python.org/pypi/chardet.

I use the Linux program 'file' to get the character encoding of different content, however I'm not sure how safe it is, see my question - Encoding detection in Python, use the chardet library or not?. But it works with great results for me so far.

Btw, your local language should not influence the detection.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top