Question

I wanna build a RSS Feed Crawler for my website. Though im not quite sure, how to begin this. How can my Crawler identify the RSS feed? Is there any thing I can crawl for, which every RSS reader has? I don't need any code, just some help for my brain to understand what I have to create.

Thanks in before!

Greetings

Xatenev

Was it helpful?

Solution

I think it would be possible if your crawler scans all links and opens each page at least one time to look for the text <rss version="2.0">. From what I understand, every RSS feed should contain this line.

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
 <title>RSS Title</title>
 <description>This is an example of an RSS feed</description>
 <link>http://www.someexamplerssdomain.com/main.html</link>
 <lastBuildDate>Mon, 06 Sep 2010 00:01:00 +0000 </lastBuildDate>
 <pubDate>Mon, 06 Sep 2009 16:20:00 +0000 </pubDate>
 <ttl>1800</ttl>

 <item>
  <title>Example entry</title>
  <description>Here is some text containing an interesting description.</description>
  <link>http://www.wikipedia.org/</link>
  <guid>unique string per item</guid>
  <pubDate>Mon, 06 Sep 2009 16:20:00 +0000 </pubDate>
 </item>

</channel>
</rss>

If you're going to use PHP, I have very positive experiences with SimpleXML which is built in PHP.

P.S. Xatenev you're welcome ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top