Little late, but i have compared main answers on internet so you can choose whats best for you:
we can do the removal of comments by regex also
soupstr=str(soup)
result=re.sub(r'<!.*?->','', soupstr)
but this method of regex is 4 times slower when we convert soup to string via soupstr=str(soup)
than findAll...isinstance(x,Comment)
as written by others.
But is 5 times faster when you have html as string and apply regex processing to remove comments.
benchmark result after running functions 1000 times:
bs4,isinstance(x,Comment) method: time: 0.01193189620971680ms
soup convert to string and apply regex: 0.04188799858093262ms
apply regex before converting to soup : 0.00195980072021484ms (WINNER!)
maybe you can use pure regex in cases where you dont want to use isinstance method.
for people who need quick result and dont want to read full answer, here is the copy paste function ready to run:
def remove_comments_regexmethod(soup):
#soup argument can be string or bs4.beautifulSoup instance it will auto convert to string, please prefer to input as (string) than (soup) if you want highest speed
if not isinstance(soup,str):
soup=str(soup)
return re.sub(r'<!.*?->','', soup)#returns a string