You already have Unicode data on your server; response.json()
produces Unicode values for any JSON string. There is no need to try and decode it.
It is the browser that is producing this Latin 1 Mojibake. The browser is sent UTF-8 (a multi-byte encoding) and the browser is interpreting individual bytes as Latin 1 characters instead. Your title, for example, starts with the Cyrilic text Со
, which is encoded to UTF-8, then misinterpreted as Latin 1:
>>> u'Со'
u'\u0421\u043e'
>>> u'Со'.encode('utf8')
'\xd0\xa1\xd0\xbe'
>>> print u'Со'.encode('utf8').decode('latin1')
Со
So the D0
A1
bytes in UTF-8, which form one codepoint, are being printed as two Latin-1 characters instead.
The Ñ
character is the D1
byte, which can be followed by about 33 non-printable second UTF-8 bytes to make a character in the range р
through to Ѡ
. Next is и
which is really и
, etc.
You need to figure out why the browser thinks your data is Latin 1.
Usually this is determined from the Content-Type
header sent to the browser; if it is set to text/html; charset=ISO-8851-1
then the browser will behave as if all text is Latin 1. It could be the HTML page has a <meta>
tag, one of <meta charset="ISO-8851-1">
or <meta http-equiv="Content-Type" content="text/html; charset="ISO-8851-1">
or similar, where there are several closely related encodings that all have similar Mojibake effects.
Another option is that you encoded it to UTF-8 explicitly, then managed to decode it somewhere to Latin-1 again before sending it to the browser.
And a 3rd option is that the JSON service you used itself sent you Latin-1 bytes in a JSON unicode string, giving you a Mojibake source. In that case you can still repair it by encoding to Latin 1 then decoding from UTF-8:
fixed = broken.encode('latin1').decode('utf8')
but do so only after you have verified that your data on the server is already Mojibaked.