문제

Let's say I have a soup and I'd like to remove all style tags for all the paragraphs. So I'd like to turn <p style='blah' id='bla' class=...> to <p id='bla' class=...> in the entire soup. But I don't want to touch, say, <img style='...'> tags. How would I do this?

도움이 되었습니까?

해결책

The idea is to iterate over all p tags using find_all('p') and remove the style attribute:

from bs4 import BeautifulSoup


data = """
<body>
    <p style='blah' id='bla1'>paragraph1</p>
    <p style='blah' id='bla2'>paragraph2</p>
    <p style='blah' id='bla3'>paragraph3</p>
    <img style="awesome_image"/>
</body>"""


soup = BeautifulSoup(data, 'html.parser')
for p in soup.find_all('p'):
    if 'style' in p.attrs:
        del p.attrs['style']

print soup.prettify()

prints:

<body>
 <p id="bla1">
  paragraph1
 </p>
 <p id="bla2">
  paragraph2
 </p>
 <p id="bla3">
  paragraph3
 </p>
 <img style="awesome_image"/>
</body>
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top