Question

Let's say I have a soup and I'd like to remove all style tags for all the paragraphs. So I'd like to turn <p style='blah' id='bla' class=...> to <p id='bla' class=...> in the entire soup. But I don't want to touch, say, <img style='...'> tags. How would I do this?

Was it helpful?

Solution

The idea is to iterate over all p tags using find_all('p') and remove the style attribute:

from bs4 import BeautifulSoup


data = """
<body>
    <p style='blah' id='bla1'>paragraph1</p>
    <p style='blah' id='bla2'>paragraph2</p>
    <p style='blah' id='bla3'>paragraph3</p>
    <img style="awesome_image"/>
</body>"""


soup = BeautifulSoup(data, 'html.parser')
for p in soup.find_all('p'):
    if 'style' in p.attrs:
        del p.attrs['style']

print soup.prettify()

prints:

<body>
 <p id="bla1">
  paragraph1
 </p>
 <p id="bla2">
  paragraph2
 </p>
 <p id="bla3">
  paragraph3
 </p>
 <img style="awesome_image"/>
</body>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top