Question

I am thinking of writing a daemon to loop through feeds and then add them into the database as ActiveRecord objects.

Firstly, one problem I am facing is that I cannot reliably retrieve the author/user of a story using the feed-normalizer gem. It appears that some times, it does not recognize the tag (I don't know if anyone else has faced this problem).

Secondly, I haven't seen anyone convert RSS feeds back into database entries. I need to do this as each entry will have associations with other ActiveRecord objects. I can't find any gems to do this specifically, but could I somehow hack something like acts_as_feed to do that?

Was it helpful?

Solution

Don't use SimpleRSS. It won't decode HTML entities for you, and it occasionally ignores the structure of the feed.

I've found it easiest to parse the feed as XML with XMLSimple, but you can use any XML parser.

OTHER TIPS

SimpleRSS exposes a very simple API and works pretty well on most feeds. I recommend not looking at the implementation as its "parser" is a bunch of regexes (which is so wrong on so many levels), but it works well.

Daemons is a good gem for running it in the background.

If you are using active record, you should follow the instructions for using AR outside of rails and then inline define the model classes. This will cut down on bloat a bit.

RSS feeds are pretty inconsistent, this is the fall through we use

  date = i[:pubDate] || i[:published] || i[:updated]
  body = i[:description] || i[:content] || i[:summary] || ""
  url = i[:guid] || i[:link]

Also, from experience, make sure you try to rescue everything (and remember that timeouts are not caught by normal rescue). It sucks to have to constantly bounce RSS daemons that get bad data.

The best approach is to use a Rails Engine connected to a Feed API like Superfeedr's. Polling RSS feeds implies that you'll need to run your own asynchronous workers and/or a queue system which can be fairly complex to build and maintain overtime. You'll also have to handle hundreds of formats and inconsistencies. Here's a blog post that shows how to consume RSS feeds in a Rails application.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top