Regex - Combining an 'or' with a 'look-behind'

https://stackoverflow.com/questions/21506285

05-10-2022
|

Question

Sorry about the confusing title. I am trying to figure out a simple Regex problem, but cannot figure out what the solution is.

I have a HTML snippet from a larger HTML document.

<td class="grade">100.0</td>
<td class="teacher">Mathias, Jordan</td>

Other Regex separates the two, giving them those class-names. I use a positive look-ahead to check for a . or a , (period or comma), and assign them the class of grade or teacher (respectively).

The problem comes later, when I want to check if the code in-between these tags is blank.

i.e. : <td class="grade"></td>

I would like to use a positive look-behind to check if the class is either grade or teacher (grade|teacher). In addition, I would like to check that there is truly nothing in between the >< (conjunction of the empty tags).

So-far, this is what I have: (?<=.*(teacher|grade)*.+>?)[^.](?=</td>)

NOTE: This is in Python

Solution

Instead of pre-processing your HTML, trust in BeautifulSoup and use regular expression searches:

soup.find_all('td', text=re.compile(','))

finds all <td> elements with the direct text in the tag containing a comma.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow