Вопрос

Has anyone had any luck writing custom indexers for nutch to index the crawl results with elasticsearch? Or do you know of any that already exist?

Это было полезно?

Решение

Haven't done it but this is definitely doable but would require to piggyback the SOLR code (src/java/org/apache/nutch/indexer/solr) and adapt it to ElasticSearch. Would be a nice contrib to Nutch BTW

Другие советы

I wrote an ElasticSearch plugin that mocks the Solr api. Using this plugin and the standard Nutch Solr indexer you can easily send crawled data into ElasticSearch. Plugin and an example of how to use it with Nutch can be found on GitHub:

https://github.com/mattweber/elasticsearch-mocksolrplugin

I know that Nutch will be adding pluggable backends and glad to see it. I had a need to integrate elasticsearch with Nutch 1.3. Code is posted here. Piggybacked off the (src/java/org/apache/nutch/indexer/solr) code.

https://github.com/ctjmorgan/nutch-elasticsearch-indexer

Time goes by and now Nucth is already integrated well with ElasticSearch. Here is a nice tutorial.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top