python tldextract.extract giving BadStatusLine: ''

Question 1

This is due to that mozilla.org URL in your stacktrace (http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1) being unavailable, and tldextract tries to update from that URL on first install. This live update can be disabled (see below), but the uncaught exception is a tldextract bug. It should only log the exception, and seamlessly fallback to package's bundled PSL.

This is fixed in tldextract 1.2.1, just published to PyPI. It switches to the GitHub mirror of the PSL. So upgrading should workaround the uncaught exception.

Another release soon will avoid future uncaught exceptions when the e.g. GitHub PSL mirror is unavailable.

Turning off the default fetch

You can avoid this problem in the previous version by turning off the default on-first-install fetch. Construct your own TLDExtract callable with fetch=False. From the docs:

import tldextract
no_fetch_extract = tldextract.TLDExtract(fetch=False)
no_fetch_extract('http://www.google.com')

Question 2

The package is trying to download a public suffix list from a URL that currently does not work:

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

This is due to a DDOS attack on that URL, Mozilla has blocked the URL for now.

This has already been reported to the project, and a fix has been proposed albeit that the latter only works if you already have a cached copy of the public suffix list.

In the meantime, use the publicsuffix package instead; it bundles the data in the package itself and does not require a URL request.

Update: Mozilla now host the file at https://publicsuffix.org/list/effective_tld_names.dat and any access to the MXR source repository without a mxr.mozilla.org Referer header redirects you to that new location.

Question 3

This is due to http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 not being served.

If you want to keep using tldextract to obtain the subdomain, domain, tld, a temporary solution is to use a cache, e.g. in project/tldextractor/__init__.py

import os 
import tldextract
TLD_CACHE_PATH = os.path.join(
    os.path.abspath(os.path.dirname(__file__)), 'tldextract_cache')
tldextractor = tldextract.TLDExtract(cache_file=TLD_CACHE_PATH, fetch=False)

In project/tldextractor/tldextract_cache: https://gist.github.com/AJamesPhillips/6899560

then:

from .tldextractor import tldextractor
tldextractor('http://subdomain.domain.tld')