Question

I'm using Nutch to crawl through a certain site (namely this one). I followed this tutorial and it worked just fine, but when I tried to inject other url for Nutch to crawl I received

$ bin/nutch inject urls
InjectorJob: starting at 2014-02-04 18:26:18
InjectorJob: Injecting urlDir: urls
InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information.
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
    at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
    at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information.
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
    ... 7 more
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information.
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:155)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1002)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:304)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:295)
    at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:157)
    at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:90)
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109)
    ... 9 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:903)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
    ... 15 more

Now I've tried restarting the machine, I've tried changing /etc/hosts as here, but it didn't work.

I'm using apache-nutch-2.2.1 and hbase-0.90.4.

Was it helpful?

Solution

I've restarted the tutorial from scratch on a clean snapshot of the virtual machine and I had no problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top