The following results are based off the HTML in the question, slightly modified to include a <br>
tag in the first paragraph.
<html><body>
<p><strong>Para<br>graph-a.</strong></p>
<div>
<p>paragraph-b.</p><p>paragraph-c.</p>
</div>
</body></html>
The Python 2.7.6 code I'm using is as follows:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("file:///C:\testing\\test.html")
element = browser.find_element_by_xpath("/html/body")
print element.text
browser.close()
The simple XPath /html/body
retrieves the elements without any of the tags.
Para
graph-a.
paragraph-b.
paragraph-c.
I can drill down to the contents of the first paragraph using /html/body/p/strong
.
Para
graph-a.
Can you tell what I think the problem is yet? Tags disappear in the sense that it's not outputting the <strong>
, but the <br>
tag translates into a newline. Let's add a few lines of code to the Python script, just before the browser close:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("file:///C:\testing\\test.html")
element = browser.find_element_by_xpath("/html/body/p/strong")
print element.text
print text == "Paragraph-a."
print text == "Para<br>graph-a."
print text == "Para\ngraph-a."
browser.close()
This script outputs the following:
Para
graph-a.
False
False
True
The conclusion is that while we can ignore most HTML tags, we need to be careful when comparing against elements that include line breaks.