How do I use Hpricot to search the inner_text of all elements?

https://stackoverflow.com/questions/18858592

28-06-2022
|

Вопрос

I would like to use Hpricot to scan the inner_text of all elements, and know what element is currently being scanned. However, each approach I have taken leads to a recursion. Is there a built-in function to do this with Hpricot (or Nokogiri)? The code below just scans one level down:

@t = []
doc = Hpricot(open("some html doc"))
(doc/"html").each do |e|
  e.children.each do |child|
    if child.is_a?(Hpricot::Text)
      @t << child.to_s.strip
    end
  end
end

Решение

Although I'm not sure exactly why you want to collect all text nodes (perhaps there is a more efficient solution), this should get you started:

require 'nokogiri'
doc = Nokogiri::HTML(open('doc'))

doc.at_css("body").traverse do |node|
  puts "***#{node.name}"
  puts node.text
end

It uses Nokogiri's traverse which will visit all nodes under your starting node.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow