I need to trim empty spaces above and after the last tag with text/content. I want to control the content displayed to the client and not "break" the visual.

<p> <br> </p>   ~> remove
<p> <br> </p>   ~> remove
<p> Text <p>
<p> <br> </p>   ~> should preserve only this of the empty tags
<p> Text </p>
<p> Text </p>
<p> <br> </p>   ~> remove
<p> <br> </p>   ~> remove
<p> <br> </p>   ~> remove

I'm using Sanitize and it has the ability of being passed a transfomer. The documentation shows an example snippet to remove all empty elements.

To remove empty elements before any regular element, I thought I could assign a variable to control when it stops removing the empty tags:

should_remove_empty = true
lambda {|env|
  node = env[:node]
  return unless node.elem?

  unless node.children.any?{|c| c.text? && c.content.strip.length > 0 || !c.text? }
    node.unlink if should_remove_empty
  else
    should_remove_empty = false
  end
}

But now, to remove the tail empty elements, I should iterate it upside down. But Sanitize doesn't give me this ability.

Does anyone know how to do this, or has anyone already implemented it?

有帮助吗?

解决方案

I'm using https://github.com/rgrove/sanitize

From the README:

Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable elements and attributes, Sanitize will remove all unacceptable HTML from a string.

That won't work for you because sometimes you want to keep the elements that are unacceptable.

require 'nokogiri'

doc = Nokogiri::HTML(<<END_OF_HTML) 
<body>
<p> <br> </p>
<p> <br> </p> 
<p> Text </p>
<p> <br> </p> 
<p> Text </p>
<p> Text </p>
<p> <br> </p>  
<p> <br> </p> 
<p> <br> </p>
</body>
END_OF_HTML

ps = doc.xpath '/html/body/p'

first_text = -1
last_text = 0

ps.each_with_index do |p, i|
  if not p.at_xpath('child::text()').text.strip.empty?  #then found some text
    first_text = i if first_text == -1
    last_text = i 
  end
end

puts ps.slice(first_text .. last_text)

--output:--
<p> Text </p>
<p> <br></p>
<p> Text </p>
<p> Text </p>
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top