Program to scrape a webpage into an index
-
13-06-2021 - |
Question
I've been looking for a program to create an index from static webpages. I'm not looking for a program like Solr, or elasticsearch because both are assuming I will be interactively creating an index. I need something that can basically go to a url, and create a search index from the pages that it pulls. It can create the index in whatever way necessary (db, xml, etc.) I just don't need the programs that are so involved with the backend database access and the code, as this search will be very light and mostly for internal purposes, on a site that does not use any of those.
Thanks for any tips that may get me started or answers that will solve my problem!
Solution
Investigate Nutch. Nutch can index a URL and what you can index is very configurable.
Once you finish crawling/indexing, that index is searchable. There is no programming involved.