How to start building a java based web-scraping tool
-
19-06-2021 - |
Question
What would be best (and shortest) way to start building a web scraping tool, which would be flexible enough to work with almost all type of websites and able to store those website in a database for retrieval.
I want to build something similar to "google search" where "google search" caches all the websites to their server before doing a search.
This is one of component for my research project.
Please let me know if there is already some open source project, which would make my task easier.
I would prefer java to build this.
Solution
Something like heritrix for example ?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow