Pergunta

I am using Python 2.7.3. Can anybody explain the difference between the literals:

'\u0391'

and:

u'\u0391'

and the different way they are echoed in the REPL below (especially the extra slash added to a1):

>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>> 
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>> 
Foi útil?

Solução

You can only use unicode escapes (\uabcd) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal (u'some text') is a different type of Python object from a python byte string ('some text').

It's like using \t versus \T; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters).

To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO; I can also recommend the Joel Spolsky on Unicode article.

Note: in Python 3, the same differences apply, but 'some text' is a Unicode string literal, and b'some text' is the bytestring syntax.

Outras dicas

As opposed to C, in Python a string can be enclosed in simple quotes (') as well as double quotes (") -- leaving aside the triple-double quotes """.

Thus, '\u0391' is only a string containing the letters \, u, 0, 3, 9 and 1. When pretty printing this string, the \ is escaped via another \.

On the contrary, having a u in front makes the string to be considered Unicode and all escapes are evaluated. Thus, u'\u0391' is interpreted as "the Unicode string containing codepoint 0391" which is different from the above.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top