How to start building a java based web-scraping tool

https://stackoverflow.com/questions/11363527

java
web-scraping
information-extraction

19-06-2021
|

문제

What would be best (and shortest) way to start building a web scraping tool, which would be flexible enough to work with almost all type of websites and able to store those website in a database for retrieval.

I want to build something similar to "google search" where "google search" caches all the websites to their server before doing a search.

This is one of component for my research project.

Please let me know if there is already some open source project, which would make my task easier.

I would prefer java to build this.

해결책

Something like heritrix for example ?

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow