質問

What would be best (and shortest) way to start building a web scraping tool, which would be flexible enough to work with almost all type of websites and able to store those website in a database for retrieval.

I want to build something similar to "google search" where "google search" caches all the websites to their server before doing a search.

This is one of component for my research project.

Please let me know if there is already some open source project, which would make my task easier.

I would prefer java to build this.

役に立ちましたか?

解決

Something like heritrix for example ?

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top