Find multiple words with regex in Beautifulsoup4

https://stackoverflow.com/questions/23305064

09-07-2023
|

質問

I'm trying to extract those div with beautifulsoup4 and regex expression in the find_all() method

<div class="prod roundedBox">
<div class="prod roundedBox last">

I've tried different possibilities but I can't get a working one for extract the tag using only the words prod roundedBox. I want to use both words! If I use one of them I take up other unwanted tags.

re.compile("prod.roundedBox")
re.compile("prod\sroundedBox.*")

are not working.

any idea!?

解決

You could simply use BeautifulSoup to find your results.

import bs4

html = '''
<div class="example">example</div>
<div class="prod roundedBox">foo</div>
<div class="prod roundedBox last">bar</div>
'''

soup = bs4.BeautifulSoup(html)
soup(attrs={'class' : ['prod', 'roundedBox']})

If you wanted to use regular expression, here is an example:

import re
import bs4

soup = bs4.BeautifulSoup(html)
soup(attrs={'class' : re.compile(r'^prod')})

Output

[<div class="prod roundedBox">foo</div>, <div class="prod roundedBox last">bar</div>]

他のヒント

No need for regex. This is what css selectors are for.

soup.select('div.prod.roundedBox')

You can grab whichever attributes you like, the above grabs anything with class prod and roundedBox. See:

soup.select('div.prod.roundedBox')
Out[38]: [<div class="prod roundedBox"></div>, <div class="prod roundedBox last"></div>]

soup.select('div.prod.roundedBox.last')
Out[39]: [<div class="prod roundedBox last"></div>]

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow