Question

I have a string of HTML where I want to strip all the html tags. The problem is that the plain text of each node is squished together and I need to add some whitespace between each node.

Nokogiri::HTML("<p>Hello</p><p>There</p>").text
Gives  => HelloThere
I want => Hello There

Can I tell Nokogiri to behave like this somehow?

Was it helpful?

Solution

You can do

doc = Nokogiri::HTML("<p>Hello</p><p>There</p>")
doc.xpath('//text()').to_a.join(" ")

OTHER TIPS

Nokogiri::HTML("<p>Hello</p><p>There</p>").xpath("//*[not(child::*)]").map(&:text).join(' ')
# => "Hello There"

EDIT: I tried to do it on my own but ended using a solution which slightly looks like Uri Agassi's :)

irb(main):040:0> Nokogiri::HTML("<p>Hello</p><p>There</p>").xpath("//text()").map(&:text).join(" ")
=> "Hello There"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top