Question

I have two machines. One machine runs HBase 0.92.2 in pseudo-distributed mode, while the other one is using Nutch 2.x crawler. How can I configure these two machines so that one machine with HBase-0.92.2 acts as back end storage and the other with Nutch-2.x acts as a crawler?

Was it helpful?

Solution

I finally did it.I was easy to do. i am sharing my experience here. May be it can help someone.

1- change the configuration file of hbase-site.xml for pseudo distributed mode.

2- MOST IMPORTANT THING: on hbase machine, replace localhost ip in /etc/hosts with your real network ip like this

10.11.22.189 master localhost

hbase machine's ip = 10.11.22.189 (note: if you won't change your hbase machine's localhost ip, remote nutch crawler won't be able to connect to it)

4- copy/symlink hbase-site.xml into $NUTCH_HOME/conf

5- start your crawler and see it working

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top