Question


I am using chardet 2.01 in python 3.2,the souce code like this site http://getpython3.com/diveintopython3/case-study-porting-chardet-to-python-3.html

can download here
http://jaist.dl.sourceforge.net/project/cygwin-ports/release-2/Python/python3-chardet/python3-chardet-2.0.1-2.tar.bz2

I use lxml2 to parse html to get some string
,and use below code to detect the encoding

chardet.detect(name)

But an error occurs

Traceback (most recent call last):
  File "C:\python\test.py", line 125, in <module>
    print(chardet.detect(str(name)))
  File "E:\Python32\lib\site-packages\chardet\__init__.py", line 24, in detect
    u.feed(aBuf)
  File "E:\Python32\lib\site-packages\chardet\universaldetector.py", line 98, in feed
    if self._highBitDetector.search(aBuf):
TypeError: can't use a bytes pattern on a string-like object

name is a string object
Convert the string to bytes means encoding it with encoding like 'utf-8','big5'
and so on,charset would detect the encoding you made....not the original string's encoding
I have no idea with this problem...

Was it helpful?

Solution

The problem is obvious, you're calling chardet on a string rather than a bytes object. What you're missing is that to Python, a string is already decoded. It doesn't have an encoding anymore.

You must fix your code so that it's giving chardet the original bytes before they were decoded into a string. If you're getting the string from another package then it has already determined the encoding and there's nothing you can do.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top