Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

Question 1

You can only use unicode escapes (\uabcd) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal (u'some text') is a different type of Python object from a python byte string ('some text').

It's like using \t versus \T; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters).

To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO; I can also recommend the Joel Spolsky on Unicode article.

Note: in Python 3, the same differences apply, but 'some text' is a Unicode string literal, and b'some text' is the bytestring syntax.

Question 2

As opposed to C, in Python a string can be enclosed in simple quotes (') as well as double quotes (") -- leaving aside the triple-double quotes """.

Thus, '\u0391' is only a string containing the letters \, u, 0, 3, 9 and 1. When pretty printing this string, the \ is escaped via another \.

On the contrary, having a u in front makes the string to be considered Unicode and all escapes are evaluated. Thus, u'\u0391' is interpreted as "the Unicode string containing codepoint 0391" which is different from the above.