Question

I've read some threads about unicode now.

I am using Python 2.7.2 but with the future print_function (because the raw print statement is quite confusing for me..)

So here is some code:

# -*- coding: L9 -*-
from __future__ import print_function, unicode_literals

now if I print things like

print("öäüߧ€")

it works perfectly. However, and yes I am totally new to python, if I declare a function which shall print unicode strings it blows my script

def foo():
    print("öäü߀")

foo()

Traceback (most recent call last):
  File "C:\Python27\test1.py", line 7, in <module>
    foo()
  File "C:\Python27\test1.py", line 5, in foo
    print("÷õ³▀Ç")
  File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x80' in position 4: character maps to <undefined>

What's the best way to handle this error and unicode in general? And should I stick with the 2.7 print statement instead?

Was it helpful?

Solution

I suspect that print("öäü߀".encode('L9')) will solve your problems.

OTHER TIPS

This may help:

print(type(s1))
s1.encode('ascii',errors='ignore') #this works
s1.decode('ascii',errors='ignore') #this does not work 

The reason is that s1.decode can't decode unicode directly so an explicit call to encode is first made, but without the errors='ignore' flag thus an error is raised

Depending on whether you were issuing your commands from a file or from a python prompt with unicode support may explain why you get an error in the latter but not the former.

Console code pages use legacy "OEM" code pages for compatibility with by old DOS console programs, while the rest of Windows uses updated code pages that support modern characters, but still differ by region. In your case the console uses cp850 and GUI programs use cp1252. cp850 doesn't support the Euro character, so Python raises an exception when trying to print the character on the console. You can run chcp 1252 before running your script if you need the Euro to work. Make sure the console font supports the character, though.

BTW, L9 != cp1252 either.

Are you sure printing from the console worked with a Euro? When I cut-and-paste your print, I get the following if the code page is 850, but it works after chcp 1252.

>>> print("öäüߧ€")
öäüߧ?                 # Note the ?

Encoding charts:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top