python encoding error only when called as external process

https://stackoverflow.com/questions/11844835

25-06-2021
|

Pergunta

A simple file like

$ cat x.py
x = u'Gen\xe8ve'
print x

when run will give me:

$ python x.py
Genève

however, when run as a "command substitution" will give:

$ echo $(python x.py)
...
UnicodeEncodeError: 'ascii' codec...

I've tried with different terminal emulators (xterm, gnome-term) and the console on a ttyS. With bash and sh. With python2.4 and 2.7. I've tried setting the LC_ALL or LANG to some utf-8 locale before running python. I've checked the sys.getdefaultencoding(). And nothing helped.

The problem arises also when the script is called from another process (like java), but the above was the easiest way I found to replicate it.

I don't understand what's the difference between the two calls. Can anyone help?

Solução

The problem here is that in the second call you are basically writing to a pipe that only accepts bytestrings (file-like object). The same happens if you try to execute this:

python x.py > my_file
Traceback (most recent call last):
File "x.py", line 2, in <module>
    print x
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 3: ordinal not in range(128)

As the receiver only understands bytestrings and not unicode characters you must first encode the unicode string into a bytestring using the encode function:

x = u'Gen\xe8ve'.encode('utf-8') 
print x

This will print the the unicode string encoded as a utf-8 bytestring (a sequence of bytes), allowing it to be written to a file-like object.

$echo $(python x.py)
Genève
$python x.py 
Genève

Outras dicas

As you suspect, Python doesn't know how to print unicode when its standard output is not a known terminal. Consider encoding the string before printing it:

# coding: utf-8
x = u'Gen\xe8ve'
print x.encode("utf-8")

Note that the invoking program and your script will need to agree in a common encoding.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow