ReGex with Python cant find the correct Regular Expression

https://stackoverflow.com/questions/15967525

03-04-2022
|

سؤال

I'm trying to extract some text out of a poorly design web page for a project, and after a long research and learning python I came close to make it happen, but the web page is poorly designed and can't find the right regular expression to do it.

So here we have what I've accomplished. http://coj.uci.cu/24h/status.xhtml?username=Diego1149&abb=1006 out of the source code of this web page I want to get the whole line of the first instance of an accepted problem. So I thought of this

exprespatFinderTitle = re.compile('<table id="submission" class="volume">.*(<tr class=.*>.*<label class="AC">.*Accepted.*</label>.*</tr>).*</table>')

but what does this does is clipping up until the last <tr> of the table. Can someone help me figure this out?

Im using Python 2.7 whit BeautifulSoup and urllib

المحلول

Stick to BeautitfulSoup alone; regular expressions are not the tool for HTML parsing:

table = soup.find('table', id='submission')
accepted = table.tbody.find('label', class_='AC')
if accepted:
    row = accepted.parent.parent  # row with accepted column

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow