Domanda

I use Python 2.7. This page says that:

Python’s default encoding is the ‘ascii’ encoding

Indeed I have the following:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

But I open my interpreter and type this:

>>> 'É'
'\xc3\x89'

It looks like utf8:

>>> u'É'.encode( 'utf8' )
'\xc3\x89'

What happened? Did the default ascii raise UnicodeEncodeError? Did it trigger utf8 encoding?

È stato utile?

Soluzione

Your terminal is configured to use UTF-8. It sends UTF-8 data to Python. Python stored that data in a bytestring.

When you then print that bytestring, the terminal interprets those bytes as UTF-8 again.

An no point is Python actually interpreting these bytes as anything other than raw bytes, no decoding or encoding takes place on the Python level.

If you were trying to decode the bytes implicitly an exception would be thrown:

>>> unicode('\xc3\x89')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Here Python used sys.getdefaultencoding() and the decoding failed.

For stdin input into the interactive prompt to create Unicode literals (using u'...'), Python does not use sys.getdefaultencoding() but the sys.stdin.encoding value:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'

which Python takes either from the PYTHIONIOENCODING environment variable (if set), or from locale.getpreferredencoding():

>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'

When reading Python source files, Python 2 would use ASCII to interpret such literals, Python 3 would use UTF-8. Both can be told about what codec to use instead using the PEP 263 source encoding comment, which has to be on the first or second line of your input file:

# coding: UTF-8
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top