You can use Tag.find_next_sibling()
here:
for header in soup.find_all('h6'):
para = header.find_next_sibling('p')
The .find_next_sibling()
call returns the first p
tag that is a next sibling of the header tag.
Demo:
>>> for header in soup.find_all('h6'):
... print header.text
... para = header.find_next_sibling('p')
... for strong_tag in para.find_all('strong'):
... print strong_tag.text, strong_tag.next_sibling
... print
...
PHYSICAL DESCRIPTION
YOB: 1987
RACE: WHITE
GENDER: FEMALE
HEIGHT: 5'05''
WEIGHT: 118
EYE COLOR: GREEN
HAIR COLOR: BROWN
SCARS, MARKS, TATTOOS
This could find the wrong <p>
tag in case you have no paragraph between the current header and the next:
<h6>Foo</h6>
<div>A div, not a p</div>
<h6>Bar</h6>
<p>This <i>is</i> a paragraph</p>
In this case, search for either a <p>
or a <h6>
tag:
for header in soup.find_all('h6'):
next_sibling = header.find_next_sibling(['p', 'h6'])
if next_sibling.name == 'h6':
# no <p> tag between this header and the next, skip
continue
The header.find_next_sibling(['p', 'h6'])
call will either find the next paragraph, or the next header, whichever comes first.