如何将字符串转换为Python中的UTF-8

题

我有一个浏览器，它将UTF-8字符发送到我的Python服务器，但是当我从查询字符串中检索它时，Python返回的编码为ASCII。如何将普通字符串转换为UTF-8？

注意：从Web传递的字符串已经编码UTF-8，我只想让Python将其视为UTF-8而不是ASCII。

解决方案

>>> plain_string = "Hi!"
>>> unicode_string = u"Hi!"
>>> type(plain_string), type(unicode_string)
(<type 'str'>, <type 'unicode'>)

^这是字节字符串（PLAIN_STRING）和UNICODE字符串之间的区别。

>>> s = "Hello!"
>>> u = unicode(s, "utf-8")

^转换为Unicode并指定编码。

其他提示

如果以上方法不起作用，您还可以告诉Python忽略无法转换为UTF-8的字符串的部分：

stringnamehere.decode('utf-8', 'ignore')

可能有点过大，但是当我在同一文件中使用ASCII和Unicode时，重复解码可能很痛苦，这就是我使用的：

def make_unicode(input):
    if type(input) != unicode:
        input =  input.decode('utf-8')
        return input
    else:
        return input

将以下行添加到.py文件的顶部：

# -*- coding: utf-8 -*-

允许您直接在脚本中编码字符串，例如：

utfstr = "ボールト"

如果我正确理解您，您的代码中有一个UTF-8编码字节弦。

将字节字符串转换为Unicode字符串被称为解码（Unicode-> Byte-string是编码）。

您可以使用 Unicode 功能或解码方法。任何一个：

unicodestr = unicode(bytestr, encoding)
unicodestr = unicode(bytestr, "utf-8")

或者：

unicodestr = bytestr.decode(encoding)
unicodestr = bytestr.decode("utf-8")

city = 'Ribeir\xc3\xa3o Preto'
print city.decode('cp1252').encode('utf-8')

在Python 3.6中，它们没有内置的Unicode（）方法。默认情况下已经将字符串存储为Unicode，不需要转换。例子：

my_str = "\u221a25"
print(my_str)
>>> √25

用ord（）和unichar（）翻译。每个Unicode Char都有一个数字，类似于索引。因此，Python有几种方法可以在字符和他的电话号码之间翻译。缺点是一个例子。希望它能有所帮助。

>>> C = 'ñ'
>>> U = C.decode('utf8')
>>> U
u'\xf1'
>>> ord(U)
241
>>> unichr(241)
u'\xf1'
>>> print unichr(241).encode('utf8')
ñ

许可以下： CC-BY-SA 和归因