Вопрос

I'm parsing an RSS feed that has an element <link> with a url within it like so, <link>http://www.google.com/</link>; however, when I try to get the url using node.css('link').text it returns an empty string. Is there another attribute I should be accessing?

I'm using nokogiri/ruby.

Example:

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
    doc.css('item').each do |item|
    puts item.css('link').text
    puts item.css('link').first.text
end
Это было полезно?

Решение

You are parsing as HTML, but the source is XML. In HTML the link element is empty so Nokogiri parses is as something like <link></link>http://example.com ... where the url is a text node outside the link element. When you then query the parsed document the link elements are empty.

To fix it you should parse as XML:

doc = Nokogiri::XML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
  # ...

Другие советы

Try getting the text of the "first" item returned by that selector:

node.css('link').first.text # => "http://www.google.com/" 

I don't know why Nokogiri doesn't recognize links here, but as always in such cases xpath comes to rescue:

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))
doc.css('item').each do |item|
  puts item.xpath("//item['link']/text()").text
end

You may to use .text? method to determine that is it a text node. And next method to get a text (to get next element).

doc = Nokogiri::HTML(open('http://www.kffl.com/printRSS.php/NFL-ARI'))

doc.css('item')[0].css('link').first.text?
# => false

doc.css('item')[0].css('link').first.next.text?
# => true

doc.css('item')[0].css('link').first.next.text

# => "http://www.kffl.com/gnews.php?id=901900-cardinals-tyrann-mathieu-expected-to-start-camp-on-pup\n            "

I don't know why

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top