hasattr()
is the wrong test; it tests if there is a a.href
attribute, and BeautifulSoup dynamically turns attributes into searches for children. HTML tag attributes are not translated into Python attributes.
Use dictionary-style testing instead; you loop over all elements which can include the DocType
instance, so I use getattr()
to not break on objects that don't have attributes:
if 'href' in getattr(link, 'attrs', {}):
You can also instruct SoupStrainer
to only match a
tags with a href
attribute by using href=True
as a keyword argument filter (not None
just means True
in any case):
for link in BeautifulSoup(test.text, parse_only=SoupStrainer('a', href=True)):
This still includes the HTML declaration of course; search for just a
links:
soup = BeautifulSoup(test.text, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all('a'):
print link