use find_next_sibling()
instead of next_sibling
. also find_previous_sibling()
instead of previous_sibling
.
reason: next_sibling
does not only return the next html tag but also the next "soup element". usually that is the whitespace between tags but can be more. find_next_sibling()
on the other hand returns the next html tag ignoring whitespace and other crud between the tags.
i restructured your code a bit to make this demonstration. i hope it is semantically the same.
code with next_sibling
demonstrating the same behaviour that you described (works for data
but not data2
)
from bs4 import BeautifulSoup, Tag
data = "<p>method-removed-here</p><p>method-removed-here</p><p>method-removed-here</p>"
data2 = """<p>method-removed-here</p>
<p>method-removed-here</p>
<p>method-removed-here</p>
<p>method-removed-here</p>
<p>method-removed-here</p>
"""
soup = BeautifulSoup(data, 'html.parser')
string = 'method-removed-here'
for p in soup.find_all("p"):
while True:
ns = p.next_sibling
if isinstance(ns, Tag) and ns.name== 'p' and p.text==string:
ns.decompose()
else:
break
print(soup)
code with find_next_sibling()
which works for both data
and data2
soup = BeautifulSoup(data, 'html.parser')
string = 'method-removed-here'
for p in soup.find_all("p"):
while True:
ns = p.find_next_sibling()
if isinstance(ns, Tag) and ns.name== 'p' and p.text==string:
ns.decompose()
else:
break
print(soup)
the same behaviour (returning all soup elements including unwanted whitespace) in other parts of beautifulsoup: BeautifulSoup .children or .content without whitespace between tags