Method to parse HTML document in Ruby?

https://stackoverflow.com/questions/2554909

23-09-2019
|

Question

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

Solution

There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

OTHER TIPS

You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

You can also try Oga by Yorick Peterse.

It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga

Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.

Follow the link for a simple crawler example.

gem install ruby-cheerio

require 'ruby-cheerio'

jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")

jQuery.find('h1').each do |head_one|
    p head_one.text
end

# getting attribute values like jQuery.
p jQuery.find('h1.one')[0].prop('h1','class')

# function chaining similar to jQuery.
p jQuery.find('body').find('h1').first.text

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow