سؤال

I'm was trying to render an HTML-page on the fly using BeautifulSoup version 4 in Django (using Apache2 with mod_python). However, as soon as I pass any HTML-string to the BeautifulSoup constructor (see code below), the browser just hangs waiting for the webserver. I tried equivalent code in CLI and it works like a charm. So I'm guessing it's something related to BeautifulSoups environment, in this case Django + Apache + mod_python.

import bs4
import django.shortcuts as shortcuts

def test(request):
    s = bs4.BeautifulSoup('<b>asdf</b>')
    return shortcuts.render_to_response('test.html', {})

I have installed BeautifulSoup using pip, pip install beautifulsoup4. I tried to install BeautifulSoup3 using standard Debian packages, apt-get install python-beautifulsoup, and then the following equivalent code works fine (both from browser and CLI).

from BeautifulSoup import BeautifulSoup
import django.shortcuts as shortcuts

def test(request):
    s = BeautifulSoup('<b>asdf</b>')
    return shortcuts.render_to_response('test.html', {})

I have looked in Apaches access and error logs and they show no information what's happening to the request that gets stalled. I have also checked /var/log/syslog and /var/log/messages, but no further info.

Here's the Apache configuration I used:

<VirtualHost *:80>
    DocumentRoot /home/nandersson/src
    <Directory /home/nandersson/src>
        SetHandler python-program
        PythonHandler django.core.handlers.modpython
        SetEnv DJANGO_SETTINGS_MODULE app.settings
        PythonOption django.root /home/nandersson/src
        PythonDebug On
        PythonPath "['/home/nandersson/src'] + sys.path"
    </Directory>

    <Location "/media/">
        SetHandler None
    </Location>
    <Location "/app/poc/">
        SetHandler None
    </Location>
</VirtualHost>

I'm not sure how to debug this further, not sure if it's a bug or not. Anyone got ideas on how to get to the bottom of this or have run into similar problems?

هل كانت مفيدة؟

المحلول

I'm using Apache2 with mod_python. I solved the hang problem by explicitly passing the 'html.parser' to get a soup.

s = bs4.BeautifulSoup('<b>asdf</b>', 'html.parser')

نصائح أخرى

This may be the interaction between Cython and mod_wsgi described here, and explored in a Beautiful Soup context here. Here are earlier questions similar to yours.

Try

doc = BeautifulSoup(html, 'html5lib')

In my cases, 'html.parser' often leads to the HTMLParseError https://groups.google.com/forum/?fromgroups=#!topic/beautifulsoup/x_L9FpDdqkc

I've experienced the same issue about a year ago, just tried on a similar setup (django+mod_wsgi+apache2) with a new version of BeautifulSoup 4.3.2 and it seems that the problem has been fixed.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top