Domanda

I'm parsing an RSS feed that has an element <link> with a url within it like so, <link>http://www.google.com/</link>; however, when I try to get the url using node.css('link').text it returns an empty string. Is there another attribute I should be accessing?

I'm using nokogiri/ruby.

Example:

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
    doc.css('item').each do |item|
    puts item.css('link').text
    puts item.css('link').first.text
end
È stato utile?

Soluzione

You are parsing as HTML, but the source is XML. In HTML the link element is empty so Nokogiri parses is as something like <link></link>http://example.com ... where the url is a text node outside the link element. When you then query the parsed document the link elements are empty.

To fix it you should parse as XML:

doc = Nokogiri::XML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
  # ...

Altri suggerimenti

Try getting the text of the "first" item returned by that selector:

node.css('link').first.text # => "http://www.google.com/" 

I don't know why Nokogiri doesn't recognize links here, but as always in such cases xpath comes to rescue:

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
doc.css('item').each do |item|
  puts item.xpath("//item['link']/text()").text
end

You may to use .text? method to determine that is it a text node. And next method to get a text (to get next element).

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))

doc.css('item')[0].css('link').first.text?
# => false

doc.css('item')[0].css('link').first.next.text?
# => true

doc.css('item')[0].css('link').first.next.text

# => "http://www.kffl.com/gnews.php?id=901900-cardinals-tyrann-mathieu-expected-to-start-camp-on-pup\n            "

I don't know why

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top