문제

I'm trying to strip out some fussy text from pages like this. I want to preserve the anchored links but lose the breaks and the a.intro. I thought I could use something like unwrap() to strip off layers but I'm getting an error: TypeError: 'NoneType' object is not callable

For kicks, I tried running the documentation sample code itself, since I couldn't see how my version differed.

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a

a_tag.i.unwrap()
a_tag
# <a href="http://example.com/">I linked to example.com</a>

I'm getting the exact same error. What am I missing here? I'm working in Scraperwiki, fwiw.

도움이 되었습니까?

해결책

Seems to be a scraperwiki issue. Works fine in ipython console.

다른 팁

I get this error too.

In [27]: type(a_tag.i.unwrap)
Out[27]: NoneType

In [28]: 'unwrap' in dir(a_tag.i)
Out[28]: False

FWIW, replace_with_children yields the same results:

In [29]: type(a_tag.i.replace_with_children)
Out[29]: NoneType

Looks like a bug to me.

In [13]: import BeautifulSoup as Bs

In [16]: Bs.__version__
Out[16]: '3.2.1'

I had the same error message with soup.select(). The reason was an old version of the BeautifulSoup4 library. Somebody at ScraperWiki fixed it (see this conversation at the ScraperWiki Google Group).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top