You cannot tell find
to ignore nested dl
elements; all you can do is ignore matches that appear in the .descendants
:
matches = []
for dl in soup.find_all('dl', attrs={'class': ['class', 'method','function','describe', 'attribute', 'data', 'clasmethod', 'staticmethod']})
if any(dl in m.descendants for m in matches):
# child of already found element
continue
matches.append(dl)
If you want nested elements and no parents, use:
matches = []
for dl in soup.find_all('dl', attrs={'class': ['class', 'method','function','describe', 'attribute', 'data', 'clasmethod', 'staticmethod']})
matches = [m for m in matches if dl not in m.descendants]
matches.append(dl)
If you wanted to pull apart the tree and remove elements from the tree, use:
matches = soup.find_all('dl', attrs={'class': ['class', 'method','function','describe', 'attribute', 'data', 'clasmethod', 'staticmethod']})
for element in matches:
element.extract() # remove from tree (and parent `dl` matches)
but you may want to adjust your text extracting instead.