문제

I am using the following ruby script from this dashing widget that retrieves an RSS feed and parses it and sends that parsed title and description to a widget.

require 'net/http'
require 'uri'
require 'nokogiri'
require 'htmlentities'

news_feeds = {
  "seattle-times" => "http://seattletimes.com/rss/home.xml",
}

Decoder = HTMLEntities.new

class News
  def initialize(widget_id, feed)
    @widget_id = widget_id
    # pick apart feed into domain and path
    uri = URI.parse(feed)
    @path = uri.path
    @http = Net::HTTP.new(uri.host)
  end

  def widget_id()
    @widget_id
  end

  def latest_headlines()
    response = @http.request(Net::HTTP::Get.new(@path))
    doc = Nokogiri::XML(response.body)
    news_headlines = [];
    doc.xpath('//channel/item').each do |news_item|
      title = clean_html( news_item.xpath('title').text )
      summary = clean_html( news_item.xpath('description').text )
      news_headlines.push({ title: title, description: summary })
    end
    news_headlines
  end

  def clean_html( html )
    html = html.gsub(/<\/?[^>]*>/, "")
    html = Decoder.decode( html )
    return html
  end

end

@News = []
news_feeds.each do |widget_id, feed|
  begin
    @News.push(News.new(widget_id, feed))
  rescue Exception => e
    puts e.to_s
  end
end

SCHEDULER.every '60m', :first_in => 0 do |job|
  @News.each do |news|
    headlines = news.latest_headlines()
    send_event(news.widget_id, { :headlines => headlines })
  end
end

The example rss feed works correctly because the URL is for an xml file. However I want to use this for a different rss feed that does not provide an actual xml file. This rss feed I want is at http://www.ttc.ca/RSS/Service_Alerts/index.rss This doesn't seem to display anything on the widget. Instead of using "http://www.ttc.ca/RSS/Service_Alerts/index.rss", I also tried "http://www.ttc.ca/RSS/Service_Alerts/index.rss?format=xml" and "view-source:http://www.ttc.ca/RSS/Service_Alerts/index.rss" but with no luck. Does anyone know how I can get the actual xml data related to this rss feed so that I can use it with this ruby script?

도움이 되었습니까?

해결책

You're right, that link does not provide regular XML, so that script won't work in parsing it since it's written specifically to parse the example XML. The rss feed you're trying to parse is providing RDF XML and you can use the Rubygem: RDFXML to parse it.

Something like:

require 'nokogiri'
require 'rdf/rdfxml'

rss_feed = 'http://www.ttc.ca/RSS/Service_Alerts/index.rss'

RDF::RDFXML::Reader.open(rss_feed) do |reader|
  # use reader to iterate over elements within the document
end

From here you can try learning how to use RDFXML to extract the content you want. I'd begin by inspecting the reader object for methods I could use:

puts reader.methods.sort - Object.methods

That will print out the reader's own methods, look for one you might be able to use for your purposes, such as reader.each_entry

To further dig down you can inspect what each entry looks like:

reader.each_entry do |entry|
  puts "----here's an entry----" 
  puts entry.inspect
end

or see what methods you can call on the entry:

reader.each_entry do |entry|
  puts "----here's an entry's methods----" 
  puts entry.methods.sort - Object.methods
  break
end

I was able to crudely find some titles and descriptions using this hack job:

RDF::RDFXML::Reader.open('http://www.ttc.ca/RSS/Service_Alerts/index.rss') do |reader|
  reader.each_object do |object|
    puts object.to_s if object.is_a? RDF::Literal
  end
end

# returns:

# TTC Service Alerts
# http://www.ttc.ca/Service_Advisories/index.jsp

#      TTC Service Alerts.

# TTC.ca
# http://www.ttc.ca
# http://www.ttc.ca/images/ttc-main-logo.gif
# Service Advisory
# http://www.ttc.ca/Service_Advisories/all_service_alerts.jsp#Service+Advisory

# 196 York University Rocket route diverting northbound via Sentinel, Finch due to a collision that has closed the York U Bus way.
# - Affecting: Bus Routes: 196 York University Rocket
# 2013-12-17T13:49:03.800-05:00
# Service Advisory (2)
# http://www.ttc.ca/Service_Advisories/all_service_alerts.jsp#Service+Advisory+(2)

# 107B Keele North route diverting northbound via Keele, Lepage due to a collision that has closed the York U Bus way.
# - Affecting: Bus Routes: 107 Keele North
# 2013-12-17T13:51:08.347-05:00

But I couldn't quickly find a way to know which one was a title, and which a description :/

Finally, if you still can't find how to extract what you want, start a new question with this info.

Good luck!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top