سؤال

I'm trying to extract those div with beautifulsoup4 and regex expression in the find_all() method

<div class="prod roundedBox">
<div class="prod roundedBox last">

I've tried different possibilities but I can't get a working one for extract the tag using only the words prod roundedBox. I want to use both words! If I use one of them I take up other unwanted tags.

re.compile("prod.roundedBox")
re.compile("prod\sroundedBox.*")

are not working.

any idea!?

هل كانت مفيدة؟

المحلول

You could simply use BeautifulSoup to find your results.

import bs4

html = '''
<div class="example">example</div>
<div class="prod roundedBox">foo</div>
<div class="prod roundedBox last">bar</div>
'''

soup = bs4.BeautifulSoup(html)
soup(attrs={'class' : ['prod', 'roundedBox']})

If you wanted to use regular expression, here is an example:

import re
import bs4

soup = bs4.BeautifulSoup(html)
soup(attrs={'class' : re.compile(r'^prod')}) 

Output

[<div class="prod roundedBox">foo</div>, <div class="prod roundedBox last">bar</div>]

نصائح أخرى

No need for regex. This is what css selectors are for.

soup.select('div.prod.roundedBox')

You can grab whichever attributes you like, the above grabs anything with class prod and roundedBox. See:

soup.select('div.prod.roundedBox')
Out[38]: [<div class="prod roundedBox"></div>, <div class="prod roundedBox last"></div>]

soup.select('div.prod.roundedBox.last')
Out[39]: [<div class="prod roundedBox last"></div>]
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top