You are looking at UTF-8 decoded as Windows codepage 1252 instead:
>>> print u'惨事'.encode('utf8').decode('cp1252')
惨事
>>> print u'最'.encode('utf8').decode('cp1252')
最
Fixing this requires going the other way:
>>> print u'惨事'.encode('cp1252').decode('utf8')
惨事
>>> print u'最'.encode('cp1252').decode('utf8')
最
There may have been some loss there though, as the UTF-8 encoding for 不
uses a codepoint not supported by 1252:
>>> u'不'.encode('utf8')
'\xe4\xb8\x8d'
>>> print u'不'.encode('utf8').decode('cp1252')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 2: character maps to <undefined>
There are several other Windows codepage candidates that can be tried here though; 1254 would result in similar output, for example, with only minor differences.