Question

Slashdot's RSS feed is http://rss.slashdot.org/Slashdot/slashdot. If I download the XML file directly, I only get a few of the posts from today. However, if I subscribe to the feed in Google Reader, and keep scrolling down in their "infinite scroll" interface, it seems like I can get an arbitrary number of Slashdot posts from the past - maybe I can get every Slashdot post ever?

  1. How does Google Reader retrieve an unlimited number of posts from an RSS feed?
  2. How can I do the same?
Was it helpful?

Solution

Google follows one instance of the feed for all its users, so they've been tracking and storing Slashdot articles, for example, long before any new subscriber starts reading.

To do the same, you would have to poll the RSS feeds you want at regular intervals and store any unique articles you find locally.

OTHER TIPS

I just discovered that if you're authenticated you can do something like:

http://www.google.com/reader/atom/feed/http://rss.slashdot.org/Slashdot/slashdot?n=100

to get an arbitrary number of results from a feed.

They have been indexing the web for years, and store everything they come over. So the moment you add a "subscribe to this" link to your page, the google crawler will start indexing that page and store it.

For RSS they also have the benefit of having multiple people subscribing to the same feed.

So for your application I suggest solving this by saving any downloaded items locally, so that new subscribes can go back to the point in time the first user subscribed to that feed. It won't give you unlimited, but over time it will give you a much larger archive than just the 20 latest items.

I built a RSS archival service that does what you're talking about (https://app.pub.center). All of the RSS is free to use via REST. If you want push notifications you have to switch to a paid plan.

PubCenter daily polls it's catalog of RSS feeds, and caches the articles. Then, you can get these articles back in a chronological order. For example:

Page 1 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=1

Page 2 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=2

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top