Getting revlant content from a url with Mechanize

Question

This will return that image as a Nokogiri::XML::Element

def get_article_image_tag
  @page.at(".article-featured-image > img")
end
#=> #<Nokogiri::XML::Element:0x19ac280 name="img" attributes= #<Nokogiri::XML::Attr:0x19ac238 name="width" value="786">, #<Nokogiri::XML::Attr:0x19ac22c name="height" value="305">, #<Nokogiri::XML::Attr:0x19ac 220 name="src" value="http://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2014/03 187265573-786x305.jpg">, #<Nokogiri::XML::Attr:0x19ac214 name="class" value="attachment-featured_post wp-post-image">, #<Nokogiri::XML::Attr:0x19ac208 name="alt" value="SWEDEN-FACEBOOK-DATA-CENTER-SERVERS">, #<Nokogiri::XML::Attr:0x19ac1fc name="title" value="Facebook launches an improved version of the News Feed redesign teased last year">]>

This will return the source url

def get_article_image_src
  @page.at(".article-featured-image > img").attributes["src"].value
end
#=>"http://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2014/03/187265573-786x305.jpg"

To get the article text

def get_article_text
  @page.at("div.article").text
end

This will return the article text without any formatting just text and non visible characters such as \n, \t, etc. This method also seems to scrape HTML/Javascript code inside the selector.

Also for dynamic capabilities you could alter your call here

def perform(type)
   self.send("get_#{type.to_s}")
end

then it can be called with any of "content", "title","article_image_tag","article_image_src" and any other get_xxx methods you define.

Edit to show your user all the images this will work in rails view

<% @page.images.each do |image| %>
  <%= image_tag(image.url) %>
<% end %>

This will iterate through all the images and display them in image tags in your page. Obviously this may need tinkering depending on if the urls are relative or full.

Honestly unless you need mechanize to set cookies or something I would take a look at Nokogiri. Not 100% sure how to do this with mechanize but with Nokogiri you could determine "relevance" of a picture by it's overall size like so.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://thenextweb.com/facebook/2014/03/06/facebook-launches-improved-version-major-news-feed-redesign-teased-last-year/#!yJ6uM"))
largest_image = doc.search("img").sort_by{|image| image.attributes["height"].value.to_i * image.attributes["width"].value.to_i}.pop