Question

I've got a problem with the encoding type of the file that i'm importing ( it contains polish special characters ). How do I make it work?

The error says:

Traceback (most recent call last):
  File "D:/Users/Denis/Dysk Google/scripts/python/napisy/napisy", line 6, in <module>
    str = inputfile.read() #name for the file
  File "D:\Python33\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 2: character maps to <undefined>

part that there is a problem with:

inputfilename = "a.txt"
outputfilename = inputfilename[0:-4]+"_fixed"+".txt"

inputfile = open(inputfilename, 'r')

str = inputfile.read() #name for the file

newstring = str.replace("œ", "s").replace("ê","e").replace("³","l").replace("¹","a").replace("¿","z").replace("ñ","n").replace("Ÿ","z").replace("æ","c")

outputfile = open(outputfilename, "w")
outputfile.write(newstring)
outputfile.close()
Was it helpful?

Solution

You should try 'cp1250' as encoding:

import codecs
content = None
with codecs.open('file-name', 'r', encoding='cp1250') as f:
    content = f.read()

print(content)

if this fails, you may also try ISO-8859-2 encoding

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top