Question

I'm trying to strip out some fussy text from pages like this. I want to preserve the anchored links but lose the breaks and the a.intro. I thought I could use something like unwrap() to strip off layers but I'm getting an error: TypeError: 'NoneType' object is not callable

For kicks, I tried running the documentation sample code itself, since I couldn't see how my version differed.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a

a_tag.i.unwrap()
a_tag
# <a href="http://example.com/">I linked to example.com</a>

I'm getting the exact same error. What am I missing here? I'm working in Scraperwiki, fwiw.

Was it helpful?

Solution

Seems to be a scraperwiki issue. Works fine in ipython console.

OTHER TIPS

I get this error too.

In [27]: type(a_tag.i.unwrap)
Out[27]: NoneType

In [28]: 'unwrap' in dir(a_tag.i)
Out[28]: False

FWIW, replace_with_children yields the same results:

In [29]: type(a_tag.i.replace_with_children)
Out[29]: NoneType

Looks like a bug to me.

In [13]: import BeautifulSoup as Bs

In [16]: Bs.__version__
Out[16]: '3.2.1'

I had the same error message with soup.select(). The reason was an old version of the BeautifulSoup4 library. Somebody at ScraperWiki fixed it (see this conversation at the ScraperWiki Google Group).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top