Question

I need to use in a project, an opensource rss crawler and feed reader (or two different tools)in java if it's possible. I've seen many differents tools, do you know which one is the best.

Thanks by advance

Was it helpful?

Solution

If you want complete search engine - look at Apache Nutch.

If you just want to understand principles of web crawling - read pretty simple introduction in "Programming collective intelligence" and more advanced introduction from "Introduction to information retrieval".

If you need parse rss and atom feeds - use Rome.

Also look at any scraper, for example Web-Harvest.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top