Domanda

I'm trying to strip out some fussy text from pages like this. I want to preserve the anchored links but lose the breaks and the a.intro. I thought I could use something like unwrap() to strip off layers but I'm getting an error: TypeError: 'NoneType' object is not callable

For kicks, I tried running the documentation sample code itself, since I couldn't see how my version differed.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a

a_tag.i.unwrap()
a_tag
# <a href="http://example.com/">I linked to example.com</a>

I'm getting the exact same error. What am I missing here? I'm working in Scraperwiki, fwiw.

È stato utile?

Soluzione

Seems to be a scraperwiki issue. Works fine in ipython console.

Altri suggerimenti

I get this error too.

In [27]: type(a_tag.i.unwrap)
Out[27]: NoneType

In [28]: 'unwrap' in dir(a_tag.i)
Out[28]: False

FWIW, replace_with_children yields the same results:

In [29]: type(a_tag.i.replace_with_children)
Out[29]: NoneType

Looks like a bug to me.

In [13]: import BeautifulSoup as Bs

In [16]: Bs.__version__
Out[16]: '3.2.1'

I had the same error message with soup.select(). The reason was an old version of the BeautifulSoup4 library. Somebody at ScraperWiki fixed it (see this conversation at the ScraperWiki Google Group).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top