This XPath works for me :
var html = @"<div class=""target"">
<p>Example Header</p>: This is the text I want!<br>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var result = doc.DocumentNode.SelectSingleNode("/div[@class='target']/text()[(normalize-space())]").OuterHtml;
Console.WriteLine(result);
/text()
select all text nodes that is direct child of the<div>
[(normalize-space())]
exclude all text nodes those contain only white spaces (there are 2 new lines excluded from this html sample : one before<p>
and the other after<br>
)
Result :
UPDATE I :
All element must have a parent, like <div>
in above example. Or if it is the root node you're talking about, the same approach should still work. The key is to use /text()
XPath to get text node :
var html = @"<p>Example Header</p>: This is the text I want!<br>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var result = doc.DocumentNode.SelectSingleNode("/text()[(normalize-space())]").OuterHtml;
Console.WriteLine(result);
UPDATE II :
Ok, so you want to select text node after <p>
element and before <br>
element. You can use this XPath then :
var result =
doc.DocumentNode
.SelectSingleNode("/text()[following-sibling::br and preceding-sibling::p]")
.OuterHtml;