Here's one way to do it. The idea is to iterate over main sections (h2
tag) and for every h2
tag iterate over siblings until next h2
tag:
from bs4 import BeautifulSoup, Tag
data = """<h2>Main Section</h2>
<p>Bla bla bla<p>
<h3>Subsection</h3>
<p>Some more info</p>
<h3>Subsection 2</h3>
<p>Even more info!</p>
<h2>Main Section 2</h2>
<p>bla</p>
<h3>Subsection</h3>
<p>Some more info</p>
<h3>Subsection 2</h3>
<p>Even more info!</p>"""
soup = BeautifulSoup(data)
for main_section in soup.find_all('h2'):
for sibling in main_section.next_siblings:
if not isinstance(sibling, Tag):
continue
if sibling.name == 'h2':
break
print sibling.text
print "-------"
prints:
Bla bla bla
Subsection
Some more info
Subsection 2
Even more info!
-------
bla
Subsection
Some more info
Subsection 2
Even more info!
-------
Hope that helps.