I will limit my expressions to the HTML snippets you provided, so I cut off the first few axis steps.
First, this query should not return any result, as the paragraph nodes do not have any subnodes (but text nodes).
//h:div[contains(@class,'text')]/h:p/node()/text()
To access all text nodes, you should use something like
//h:div[contains(@class,'text')]/h:p/text()
Joining a string heavily depends on the XPath version you're able to use. If rapidminer provides XPath 2.0 (it probably does not), you're lucky and can use string-join(...)
, which joins all string together to a single one:
string-join(//h:div[contains(@class,'text')]/h:p/text())
If you're stuck with XPath 1.0, you cannot do this but for a fixed number of strings, enumerating all of them. I added the newlines for readability reasons, remove them if you want to:
concat(
//h:div[contains(@class,'text')]/h:p[1]/text(),
//h:div[contains(@class,'text')]/h:p[2]/text(),
//h:div[contains(@class,'text')]/h:p[3]/text(),
//h:div[contains(@class,'text')]/h:p[4]/text()
)