문제

What would be best (and shortest) way to start building a web scraping tool, which would be flexible enough to work with almost all type of websites and able to store those website in a database for retrieval.

I want to build something similar to "google search" where "google search" caches all the websites to their server before doing a search.

This is one of component for my research project.

Please let me know if there is already some open source project, which would make my task easier.

I would prefer java to build this.

도움이 되었습니까?

해결책

Something like heritrix for example ?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top