genome diagram fail: Unicode Decode Error

https://stackoverflow.com/questions/22130554

19-10-2022
|

سؤال

I am trying to get the genome diagram function of biopython to work but it currently fails. This is the output, i'm not sure what the error means. Any suggestions?

======================================================================
ERROR: test_partial_diagram (test_GenomeDiagram.DiagramTest)
construct and draw SVG and PDF for just part of a SeqRecord.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_GenomeDiagram.py", line 662, in test_partial_diagram
assert open(output_filename).read().replace("\r\n", "\n") \
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 11: invalid start byte

المحلول

Your data file is composed of bytes which are encoded is some encoding other than utf-8. You need to specify the right encoding.

 open(output_filename, encoding=...)

There is no entirely reliable way for us to tell you what encoding it should be. But since

In [156]: print('\x93'.decode('cp1252'))
“

(and since the quotation mark is a pretty common character) you might want to try using

open(output_filename, encoding='cp1252')

on line 662 of test_GenomeDiagram.py.

نصائح أخرى

UTF-8 is a variable byte encoding. In cases where a character is being encoding that requires multiple bytes, the second an subsequent bytes are of the form 10xxxxxx and no initial bytes (or single byte characters) have this form. As such, 0x93 cannot ever be the first byte of a UTF-8 character. The error message is telling you that your buffer contains an invalid UTF-8 byte sequence.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow