How to ignore children classes when extracting text from a class using Selenium in Python?

https://stackoverflow.com/questions/18097088

23-06-2022
|

Pergunta

I am trying to extract the text from a class on html page using Selenium in Python. While doing so, my code is also extracting the text from its child class.

Below is the code I am using:

monthlyprice = browser.find_element_by_class_name('tila-container').text

HTML Snippet:

<div class="tila-container tila-term header7a">
+ $8
<sup class="super-decimal-price">25</sup>
x 24/mo. If you cancel wireless service, remaining balance on phone becomes due. 0% APR O.A.C for well-qualified buyers. Qual’g service req’d.
</div>

Above pasted piece of HTML code is the one which is causing problem, I want to extract text value + $8 mentioned under tila-container class but my code is giving me text present under its child class super-decimal-price and I also don't want text mentioned after this child class starting from "x 24/mo."

Folks help me in resolving this.

Solução

It's difficult. As far as webdriver is concerned, the text before and after the child <span> is equally validly part of the text content; and it doesn't have methods to just return bits and pieces of the text content.

What I'd try is:

Use a method to get the full inner html of the div.
Use string manipulation to divide it up into three parts; before the <span>, the <span> itself, and after the <span>.

The first part is reasonably straightforward; see Get HTML Source of WebElement in Selenium WebDriver using Python for how to get the html source of a single element.

The second part is not too difficult either; it should be easy enough with python's string functions. It will get complicated, however, if the format of the inner text is more variable (i.e. not just text-span-text each time).

Good luck!

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow