selenium testing on website content based on the unformatted content as the expected value

Question 1

The following results are based off the HTML in the question, slightly modified to include a <br> tag in the first paragraph.

<html><body>
<p><strong>Para<br>graph-a.</strong></p>
<div>
<p>paragraph-b.</p><p>paragraph-c.</p>
</div>
</body></html>

The Python 2.7.6 code I'm using is as follows:

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("file:///C:\testing\\test.html")
element = browser.find_element_by_xpath("/html/body")
print element.text
browser.close()

The simple XPath /html/body retrieves the elements without any of the tags.

Para
graph-a.
paragraph-b.
paragraph-c.

I can drill down to the contents of the first paragraph using /html/body/p/strong.

Para
graph-a.

Can you tell what I think the problem is yet? Tags disappear in the sense that it's not outputting the <strong>, but the <br> tag translates into a newline. Let's add a few lines of code to the Python script, just before the browser close:

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("file:///C:\testing\\test.html")
element = browser.find_element_by_xpath("/html/body/p/strong")
print element.text
print text == "Paragraph-a."
print text == "Para<br>graph-a."
print text == "Para\ngraph-a."
browser.close()

This script outputs the following:

Para
graph-a.
False
False
True

The conclusion is that while we can ignore most HTML tags, we need to be careful when comparing against elements that include line breaks.

Question 2

Please try the given below scripting

int no_of_paragraphs = driver.findElements(By.tagName("p")).size();

for(int i=1;i<=no_of_paragraphs;i++)

{

   System.out.print(driver.findElement(By.cssSelector("p:nth-of-type("+i+")")).getText() + "\t");

}