Question

Has anyone had any luck writing custom indexers for nutch to index the crawl results with elasticsearch? Or do you know of any that already exist?

Was it helpful?

Solution

Haven't done it but this is definitely doable but would require to piggyback the SOLR code (src/java/org/apache/nutch/indexer/solr) and adapt it to ElasticSearch. Would be a nice contrib to Nutch BTW

OTHER TIPS

I wrote an ElasticSearch plugin that mocks the Solr api. Using this plugin and the standard Nutch Solr indexer you can easily send crawled data into ElasticSearch. Plugin and an example of how to use it with Nutch can be found on GitHub:

https://github.com/mattweber/elasticsearch-mocksolrplugin

I know that Nutch will be adding pluggable backends and glad to see it. I had a need to integrate elasticsearch with Nutch 1.3. Code is posted here. Piggybacked off the (src/java/org/apache/nutch/indexer/solr) code.

https://github.com/ctjmorgan/nutch-elasticsearch-indexer

Time goes by and now Nucth is already integrated well with ElasticSearch. Here is a nice tutorial.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top