Generally when people ask about how to "remove" something with bs4
, they're really just asking how to not include it in a find
operation.
You want to exclude the extra spaces (i.e. tags with tag.text == ''
) and those four "column header" tags. You can do the latter through CSS selectors, but the former needs to be explicitly filtered. So it's easiest to do both at once, and is more declarative in my opinion:
soup = BeautifulSoup(that_long_html_you_gave)
blacklist = {'Device Type','IP Address','Device Name','Notes'}
table = soup.body # to match your variable name. I think.
table.find_all(lambda tag: tag.text and tag.text not in blacklist)
Out[45]:
[<td align="left" width="150">AudioCodes Gateway</td>,
<td align="left" width="115">172.31.31.2</td>,
<td align="left" width="215">FXO</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.151</td>,
<td align="left" width="100">IND056GIC151</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.152</td>,
<td align="left" width="100">IND056GIC152</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>]