Pregunta

I'm trying to raise exception in python 2.7.x which includes a unicode in the message. I can't seem to make it work.

Is it not supported or not recommended to include unicode in error msg? Or do i need to be looking at sys.stderr?

 # -*- coding: utf-8 -*-
 class MyException(Exception):
  def __init__(self, value):
    self.value = value
  def __str__(self):
    return self.value
  def __repr__(self):
    return self.value
  def __unicode__(self):
    return self.value

desc = u'something bad with field \u4443'

try:
  raise MyException(desc)
except MyException as e:
  print(u'Inside try block : ' + unicode(e))

# here is what i wish to make work 
raise MyException(desc)

Running script produces the output below. Inside my try/except i can print the string without problem.

My problem is outside the try/except.

Inside try block : something bad with field 䑃
Traceback (most recent call last):
  File "C:\Python27\lib\bdb.py", line 387, in run
    exec cmd in globals, locals
  File "C:\Users\ghis3080\r.py", line 25, in <module>
    raise MyException(desc)
MyException: something bad with field \u4443

Thanks in advance.

¿Fue útil?

Solución

The behaviour depends on Python version and the environment. On Python 3 the character encoding error handler for sys.stderr is always 'backslashreplace':

from __future__ import unicode_literals, print_function
import sys

s = 'unicode "\u2323" smile'
print(s)
print(s, file=sys.stderr)
try:
    raise RuntimeError(s)
except Exception as e:
    print(e.args[0])
    print(e.args[0], file=sys.stderr)
    raise

python3:

$ PYTHONIOENCODING=ascii:ignore python3 raise_unicode.py
unicode "" smile
unicode "\u2323" smile
unicode "" smile
unicode "\u2323" smile
Traceback (most recent call last):
  File "raise_unicode.py", line 8, in <module>
    raise RuntimeError(s)
RuntimeError: unicode "\u2323" smile

python2:

$ PYTHONIOENCODING=ascii:ignore python2 raise_unicode.py
unicode "" smile
unicode "" smile
unicode "" smile
unicode "" smile
Traceback (most recent call last):
  File "raise_unicode.py", line 8, in <module>
    raise RuntimeError(s)
RuntimeError

That is on my system the error message is eaten on python2.

Note: on Windows you could try:

T:\> set PYTHONIOENCODING=ascii:ignore
T:\> python raise_unicode.py

For comparison:

$ python3 raise_unicode.py
unicode "⌣" smile
unicode "⌣" smile
unicode "⌣" smile
unicode "⌣" smile
Traceback (most recent call last):
  File "raise_unicode.py", line 8, in <module>
    raise RuntimeError(s)
RuntimeError: unicode "⌣" smile

Otros consejos

This is how Python works. I believe what you are seeing is coming from traceback._some_string() in the Python core library. In that module, when a stack trace is done, the code in that method first tries to convert the message using str(), then if that raises an exception, converts the message using unicode(), then converts it to ascii using encode("ascii", "backslashreplace"). You are getting valid output, and everything is working correctly, my guess is that Python is doing it's best to pseudo-down convert the error message so that it will display without problems no matter the platform executing it. That is just the unicode codepoint for your character. It doesn't happen in your try/except block because this conversion is something specific to the mechanism that produces stack traces (such as in the event of uncaught exceptions).

In my case your example worked as it should, printing nice unicode.

But sometimes you have a lot of problems with exception stack printed without (or with escaped/backslashed) unicode characters. It is possible to overcome the obstacle and print normal messages.

Example of the problem with output (Python 2.7, linux):

# -*- coding: utf-8 -*-
desc = u'something bad with field ¾'
raise SyntaxError(desc.encode('utf-8', 'replace'))

It will print only truncated or screwed message:

~/.../sources/C_patch$ python SO.py 
Traceback (most recent call last):
  File "SO.py", line 25, in <module>
    raise SyntaxError(desc)
SyntaxError

To actually see the unaltered unicode, you can encode it to raw bytes and feed into exception object:

# -*- coding: utf-8 -*-
desc = u'something bad with field ¾'
raise SyntaxError(desc.encode('utf-8', 'replace'))

This time you will see the full message:

~/.../sources/C_patch$ python SO.py 
Traceback (most recent call last):
  File "SO.py", line 3, in <module>
    raise SyntaxError(desc.encode('utf-8', 'replace'))
SyntaxError: something bad with field ¾

You can do value.encode('utf-8', 'replace') in your constructor, if you like, but with system exception you will have to do it in the raise statement, like in the example.

The hint is taken from here: Overcoming frustration: Correctly using unicode in python2 (there are big library with many helpers, and all of them can be stripped down to the example above).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top