Question

Sorry about the confusing title. I am trying to figure out a simple Regex problem, but cannot figure out what the solution is.

I have a HTML snippet from a larger HTML document.

  • <td class="grade">100.0</td>

  • <td class="teacher">Mathias, Jordan</td>

Other Regex separates the two, giving them those class-names. I use a positive look-ahead to check for a . or a , (period or comma), and assign them the class of grade or teacher (respectively).


The problem comes later, when I want to check if the code in-between these tags is blank.

  • i.e. : <td class="grade"></td>

I would like to use a positive look-behind to check if the class is either grade or teacher (grade|teacher). In addition, I would like to check that there is truly nothing in between the >< (conjunction of the empty tags).

So-far, this is what I have: (?<=.*(teacher|grade)*.+>?)[^.](?=</td>)

NOTE: This is in Python

Was it helpful?

Solution

Instead of pre-processing your HTML, trust in BeautifulSoup and use regular expression searches:

soup.find_all('td', text=re.compile(','))

finds all <td> elements with the direct text in the tag containing a comma.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top