Question

I have a Rake task set-up, and it works almost how I want it to.

I'm scraping information from a site and want to get all of the player ratings into an array, ordered by how they appear in the HTML. I have player_ratings and want to do exactly what I did with the player_names variable.

I only want the fourth <td> within a <tr> in the specified part of the doc because that corresponds to the ratings. If I use Nokogiri's text, I only get the first player rating when I really want an array of all of them.

task :update => :environment do
  require "nokogiri"
  require "open-uri"

  team_ids = [7689, 7679, 7676, 7680]
  player_names = []

  for team_id in team_ids do
    url = URI.encode("http://modules.ussquash.com/ssm/pages/leagues/Team_Information.asp?id=#{team_id}")
        doc = Nokogiri::HTML(open(url))
        player_names = doc.css('.table.table-bordered.table-striped.table-condensed')[1].css('tr td a').map(&:content)
        player_ratings = doc.css('.table.table-bordered.table-striped.table-condensed')[1].css('tr td')[3]
        puts player_ratings       
      player_names.map{|player| puts player}
    end

end

Any advice on how to do this?

Was it helpful?

Solution

I think changing your xpath might help. Here is the xpath

nodes = doc.xpath "//table[@class='table table-bordered table-striped table-condensed'][2]//tr/td[4]"

data = nodes.each {|node| node.text }

Iterating the nodes with node.text gives me

4.682200 
5.439000 
5.568400 
5.133700 
4.480800 
4.368700 
2.768100 
3.814300 
5.103400 
4.567000 
5.103900 
3.804400 
3.737100 
4.742400 

OTHER TIPS

I'd recommend using Wombat (https://github.com/felipecsl/wombat), where you can specify that you want to retrieve a list of elements matched by your css selector and it will do all the hard work for you

It's not well known, but Nokogiri implements some of jQuery's JavaScript extensions for searching using CSS selectors. In your case, the :eq(n) method will be useful:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<html>
<body>
  <table>
    <tr>
      <td>1</td>
      <td>2</td>
      <td>3</td>
      <td>4</td>
    </tr>
  </table>
</body>
</html>
EOT

doc.at('td:eq(4)').text # => "4"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top