Question

I'm parsing an RSS feed that has an element <link> with a url within it like so, <link>http://www.google.com/</link>; however, when I try to get the url using node.css('link').text it returns an empty string. Is there another attribute I should be accessing?

I'm using nokogiri/ruby.

Example:

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
    doc.css('item').each do |item|
    puts item.css('link').text
    puts item.css('link').first.text
end
Était-ce utile?

La solution

You are parsing as HTML, but the source is XML. In HTML the link element is empty so Nokogiri parses is as something like <link></link>http://example.com ... where the url is a text node outside the link element. When you then query the parsed document the link elements are empty.

To fix it you should parse as XML:

doc = Nokogiri::XML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
  # ...

Autres conseils

Try getting the text of the "first" item returned by that selector:

node.css('link').first.text # => "http://www.google.com/" 

I don't know why Nokogiri doesn't recognize links here, but as always in such cases xpath comes to rescue:

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
doc.css('item').each do |item|
  puts item.xpath("//item['link']/text()").text
end

You may to use .text? method to determine that is it a text node. And next method to get a text (to get next element).

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))

doc.css('item')[0].css('link').first.text?
# => false

doc.css('item')[0].css('link').first.next.text?
# => true

doc.css('item')[0].css('link').first.next.text

# => "http://www.kffl.com/gnews.php?id=901900-cardinals-tyrann-mathieu-expected-to-start-camp-on-pup\n            "

I don't know why

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top