I have searched for quite a while for the answer to this question and I think a lot of it has to do with my unfamiliarity with how the subprocess module works. This is for a fuzzing program if anyone is interested. Also, I should mention that this is all being done in Linux (I think that is pertinent) I have some code like this:

# open and run a process and log get return code and stderr information
process = subprocess.Popen([app, file_name], stdout=subprocess.PIPE,
                                             stderr=subprocess.PIPE)
return_code = process.wait()
err_msg = process.communicate()[1]

# insert results into an sqlite database log
log_cur.execute('''INSERT INTO log (return_code, error_msg) 
                   VALUES (?,?)''', [unicode(return_code), unicode(error_msg)])
log_db.commit()

99 out of 100 times it works just fine but occasionally i get an error similar to:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 43: invalid continuation byte

EDIT: Full-trace

Traceback (most recent call last):
  File "openscadfuzzer.py", line 72, in <module>
    VALUES (?,?)''', [crashed, err_msg.decode('utf-8')])
  File "/home/username/workspace/GeneralPythonEnv/openscadfuzzer/lib/python2.7/encodings/utf_8.py",    line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xdb in position 881: invalid continuation byte

Is this a problem with subprocess, the application that I am using it to run or my code? Any pointers would be appreciated (especially when it pertains to the correct usage of subprocess stdout and stderr).

有帮助吗?

解决方案

My guess is that the problem is this call:

unicode(error_msg)

What is the type of error_msg? I'm fairly sure by default the subprocess APIs will return the raw bytes output by the child program, the call to unicode tries to convert the bytes into characters (code points), by assuming some encoding (in this case utf8).

My guess is that the bytes aren't valid utf8, but are valid latin1. You can specify what codec to convert between bytes and characters:

error_msg.decode('latin1')

Here's an example that hopefully demonstrates the problem and workaround:

>>> b'h\xcello'.decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 1: invalid continuation byte

>>> b'h\xcello'.decode('latin1')
'hÎllo'

A better solution might be to make your child process output utf8, but then that depends on what data your database is capable of storing also.

其他提示

You can find very good Subprocess tutorial here http://pymotw.com/2/subprocess/ and its official documentation here: http://docs.python.org/2/library/subprocess.html, but from how the error you're getting is formatted, it seems it is not your code, but your application that gets the error, and you're only seeing it, because you're collecting the output. To confirm that, you can run your app outside your code, using a simple bash loop, to see if you can catch the error again and in your code, check the exit code of the application - when you see the error it should be different than 0, if the application correctly provides exit codes.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top