Question

Hello I have some html file:

<div class="text">
   <p></p>
   <p>text in p2</p>
   <p></p>
   <p>text in p4</p>
</div>

and other are like:

<div class="text">    
   <p>text in p1</p>
   <p></p>
   <p>text in p3</p>
   <p></p>
</div>

My query is: (in rapidminer)

//h:div[contains(@class,'inside')]/h:div[contains(@class,'text')]/h:p/node()/text()

but return only first <p>.

My question is how can join all text in <p> in the same string?

Thank you

Was it helpful?

Solution

I will limit my expressions to the HTML snippets you provided, so I cut off the first few axis steps.

First, this query should not return any result, as the paragraph nodes do not have any subnodes (but text nodes).

//h:div[contains(@class,'text')]/h:p/node()/text()

To access all text nodes, you should use something like

//h:div[contains(@class,'text')]/h:p/text()

Joining a string heavily depends on the XPath version you're able to use. If rapidminer provides XPath 2.0 (it probably does not), you're lucky and can use string-join(...), which joins all string together to a single one:

string-join(//h:div[contains(@class,'text')]/h:p/text())

If you're stuck with XPath 1.0, you cannot do this but for a fixed number of strings, enumerating all of them. I added the newlines for readability reasons, remove them if you want to:

concat(
  //h:div[contains(@class,'text')]/h:p[1]/text(),
  //h:div[contains(@class,'text')]/h:p[2]/text(),
  //h:div[contains(@class,'text')]/h:p[3]/text(),
  //h:div[contains(@class,'text')]/h:p[4]/text()
)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top