Question

I get the following error when I try to call BeautifulSoup(page)

Traceback (most recent call last):
 File "error.py", line 10, in <module>
  soup = BeautifulSoup(page)
 File "C:\Python33\lib\site-packages\bs4\__init__.py", line 169, in __init__
  self.builder.prepare_markup(markup, from_encoding))
 File "C:\Python33\lib\site-packages\bs4\builder\_htmlparser.py", line 136, in
 prepare_markup
  dammit = UnicodeDammit(markup, try_encodings, is_html=True)
 File "C:\Python33\lib\site-packages\bs4\dammit.py", line 223, in __init__
  u = self._convert_from(chardet_dammit(self.markup))
 File "C:\Python33\lib\site-packages\bs4\dammit.py", line 30, in chardet_dammit

   return chardet.detect(s)['encoding']
 File "C:\Python33\lib\site-packages\chardet\__init__.py", line 21, in detect
  import universaldetector
ImportError: No module named 'universaldetector'

I am running Python 3.3 in windows 7, I have installed bs4 from the setup.py by downloading the .tar.gz. I have installed pip and then installed chardet by doing pip.exe install chardet. My chardet version is 2.2.1. Bs4 works fine for other url.

Here's the code

import sys
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import chardet

url = "http://www.edgar-online.com/brand/yahoo/search/?cik=1400810"
page = urlopen(url).read()
#print(page)
soup = BeautifulSoup(page)

I look forward to your answers

Was it helpful?

Solution

I meet this situation just now.
Do not import chardet,and I also uninstall chardet.
Then build would pass.
below code is a part of dammit.py lib in beautifulsoup.
Maybe you import a chardet not fits python 3.3, so the error occurs.

try:
    # First try the fast C implementation.
    #  PyPI package: cchardet
    import cchardet
    def chardet_dammit(s):
        return cchardet.detect(s)['encoding']
except ImportError:
    try:
        # Fall back to the pure Python implementation
        #  Debian package: python-chardet
        #  PyPI package: chardet
        import chardet
        def chardet_dammit(s):
            return chardet.detect(s)['encoding']
        #import chardet.constants
        #chardet.constants._debug = 1
    except ImportError:
        # No chardet available.
        def chardet_dammit(s):
            return None
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top