Pergunta

I have a string of HTML where I want to strip all the html tags. The problem is that the plain text of each node is squished together and I need to add some whitespace between each node.

Nokogiri::HTML("<p>Hello</p><p>There</p>").text
Gives  => HelloThere
I want => Hello There

Can I tell Nokogiri to behave like this somehow?

Foi útil?

Solução

You can do

doc = Nokogiri::HTML("<p>Hello</p><p>There</p>")
doc.xpath('//text()').to_a.join(" ")

Outras dicas

Nokogiri::HTML("<p>Hello</p><p>There</p>").xpath("//*[not(child::*)]").map(&:text).join(' ')
# => "Hello There"

EDIT: I tried to do it on my own but ended using a solution which slightly looks like Uri Agassi's :)

irb(main):040:0> Nokogiri::HTML("<p>Hello</p><p>There</p>").xpath("//text()").map(&:text).join(" ")
=> "Hello There"
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top