One option is to switch to lxml.html
and use XPath - and the result of that will already be a list... (and since an XPath expression is just a string - it's easier to formulate than playing around with class inheritance)
>>> tag = 'a'
>>> attr = 'href'
>>> xpq = '//{}/@{}'.format(tag, attr)
>>> a = '<a href="test-or-something">hello</a><a>No href here</a><a href="something-else">blah</a>'
>>> import lxml.html
>>> lxml.html.fromstring(a).xpath(xpq)
['test-or-something', 'something-else']
if you have to use stdlib - then you could do something similar with HTMLParser
from HTMLParser import HTMLParser
class ListTags(HTMLParser):
def __init__(self, tag, attr):
HTMLParser.__init__(self)
self.tag = tag
self.attr = attr
self.matches = []
def handle_starttag(self, tag, attrs):
if tag == self.tag:
ad = dict(attrs)
if self.attr in ad:
self.matches.append(ad[self.attr])
>>> lt = ListTags('a', 'href')
>>> lt.feed(a)
>>> lt.matches
['test-or-something', 'something-else']