Question

Iam trying to configure nutch for running multi-threaded crawling.

However , Iam facing an issue. I am not able to run crawl with multiple threads , I have modified the nutch-site.xml to use 25 threads but still I can see only 1 Threads running.

<property>
  <name>fetcher.threads.fetch</name>
  <value>25</value>
  <description>The number of FetcherThreads the fetcher should use.
    This is also determines the maximum number of requests that are 
    made at once (each FetcherThread handles one connection).</description>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>25</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>

I always get the value of activeThreads=25, spinWaiting=24, fetchQueues.totalSize=some value.

Whats the meaning of this, can you please explain whats the issue and how can I solve it.

I will highly appreciate your help.

Thanks, Sumit

Was it helpful?

Solution

I think your issue is related to a known bug w/the new Nutch fetcher. See NUTCH-721.

You can try using OldFetcher (if you have Nutch 1.0) to see if that solves your problem.

-- Ken

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top