Why does my python regex code not work properly? [duplicate]

https://stackoverflow.com/questions/23551422

html
python
regex
html-parsing

18-07-2023
|

質問

Good evening. I got the following html code:

<tr>
   <td>value:</td>
   <td>0</td>
</tr>

This code is part of a full html webpage. I want to parse the value in the second td-tag.

This is my attempt:

pattern = re.compile('<td>value:</td>.*?<td>(.*?)</td>', re.S)
value = pattern.search(source_code).group(1)

source_code is the full webpage source code.

When I run this code, I get this message: AttributeError: 'NoneType' object has no attribute 'group'

解決

Do not parse HTML with regex.

Instead, use a specialized tool, an html parser, like BeautifulSoup:

>>> from bs4 import BeautifulSoup
>>> data = """<tr>
...    <td>value:</td>
...    <td>0</td>
... </tr>"""
>>> soup = BeautifulSoup(data)
>>> soup.find('tr')('td')[1].text
u'0'
>>> soup.find('td', text='value:').find_next_sibling('td').text
u'0'

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow