Domanda

I'm making a small Rails application to parse an HTML playlist from a local public radio station and display the currently playing song.

I've created a class to model the songs on the playlist that looks like this:

require 'open-uri'

class Song

  attr_accessor :artist, :title, :album, :playtime

  def initialize(attributes = {})
    attributes.each do |name, value|
      send("#{name}=", value)
    end
  end

  def self.latest(how_many)
    html = Nokogiri::HTML(open(Rails.configuration.on_air_url))
    rows = html.css('#table_tracklist tbody tr')
    rows.take(how_many).map do |row|
      parse_song(row)
    end
  end

  private

  def self.parse_song(row)
    artist = row.css('.artist').text
    playtime = row.css('.time span').text
    title = row.css('.song').text
    album = row.css('.album').text
    Song.new({ artist: artist, playtime: playtime, title: title, album: album })
  end

end

I have a couple of questions about this:

  1. I'm not using any of the ActiveRecord or ActiveModel features. Does this still belong as a class in my models directory or should I refactor it to a class in lib? I was planning on having a controller whose only purpose will be to return the latest songs played via JSON to the client. Is there a better way?
  2. I'm happy with the Song::latest method, but I feel like there should be a more elegant way to do Song::parse_song. I'm thinking of changing my model's attributes to match the names of the CSS classes the playlist uses and using an array of attribute names I want to grab, but since there is the special case of the "time" field (where I want to grab the text of the span) it seems like it would be more clear this way. Could you offer some suggestions?

I think removing my initialize method and doing something like this would be better. Thoughts? [note: incorporates The Tin Man's answer below.]

  def self.parse_song(row)
    song = Song.new
    song.artist = row.at_css('.artist').text
    song.playtime = row.at_css('.time span').text
    song.title = row.at_css('.song').text
    song.album = row.at_css('.album').text
    song
  end
È stato utile?

Soluzione

You don't understand what css does:

artist = row.css('.artist').text
playtime = row.css('.time span').text
title = row.css('.song').text
album = row.css('.album').text

Which should be:

artist = row.at('.artist').text
playtime = row.at('.time span').text
title = row.at('.song').text
album = row.at('.album').text

css, like search and xpath returns a NodeSet. A NodeSet is like an array of Nodes. Even if you know there is only one matching element in the document, css will still return a set. If there are multiple hits for a particular selector, you're going to receive all nodes that match.

When you use text on a NodeSet you're going to get a concatenated string of all the text in the nodes, which is most likely not what you want:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <body>
    <p>foo</p>
    <p>bar</p>
  </body>
</html>
EOT

doc.css('p').text # => "foobar"

Also, Nokogiri is very forgiving/understanding when it comes to the code we use to talk to it. We don't have to use css, or xpath, we can use search and let Nokogiri figure out whether the selector is CSS or XPath:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <body>
    <p>foo</p>
    <p>bar</p>
  </body>
</html>
EOT

doc.css('p').size # => 2
doc.search('p').size # => 2

The same is true for at, at_css and at_xpath:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <body>
    <p>foo</p>
    <p>bar</p>
  </body>
</html>
EOT

doc.at_css('p').text # => "foo"
doc.at('p').text # => "foo"

I recommend being lazy and using search and at for 99.9% of the times you write code searching for nodes, then use the CSS/XPath variant for those oh-so-few times when you have to give Nokogiri a hint what the selector is.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top