Question

I'm using Rails 3 to scrape a website, and doing a query like so:

agent = Mechanize.new
doc = agent.get(url)

I'm then doing

doc.search("//div")

Which returns a list of all divs on the page. I'd like to select the div that has the largest font size. Is there anyway to use Mechanize, Nokogiri, or any other Rails gem to find the computed font-size of a div, and from there, choose the one with the largest font size?

Thanks

Was it helpful?

Solution

You can't do this with Mechanize or Nokogiri, because they simply read the static HTML. Yet font size isn't usually defined in HTML anymore; it is generally defined in CSS or added programmatically using JavaScript.

The only solution is to be able to execute JavaScript and use JavaScript's getComputedStyle method which can get the font size that has been applied to an element (via either CSS or JS). So you need a way to inject JS into your pages and get a result. This may be possible using watir-webdriver, because Selenium has hooks to do this. See the very end of this page for instructions on how to inject JS and return a result back to the caller in Selenium. Another option is PhantomJS which is a headless browser with a JS API.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top