Question

I have crawled a site successfully using NUTCH 1.2 .Now I want to integrate this with solr 3.1 . Problem is when I am issuing command $ bin/nutch solrindex localhost:8080/solr/ crawl/crawldb crawl/linkdb cra wl/segments/* an error occurs. I am attaching my nutch logs

Please help me to solve this issue

Bad Request

request: //localhost:8080/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2013-07-08 17:38:47,577 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

Was it helpful?

Solution

You'll need to add following Apache Commons library to the classpath: commons-httpclient.jar (you would put it in the same folder where other JARs reside that are used by your nutch installation).

You can find the current version of HttpClient here http://hc.apache.org/httpcomponents-client-ga/

Please note that it is possible that your Nutch version uses an older version of the HttpClient and the current version of the HttpClient is not backward compatible with that older version. In this case you'll need to download that older version of the HttpClient and include that older version within your libs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top