Question

I use Curator 1.2.4 and I keep getting ConnectionLossException when I want to monitor one znode for its children's changes.

I then implemented a watcher like this

public class CuratorChildWatcherImpl implements CuratorWatcher {

    private CuratorFramework client;

    public CuratorChildWatcherImpl(CuratorFramework client) {
        this.client = client;
    }

    @Override
    public void process(WatchedEvent event) throws Exception {
       List<String> children=client.getChildren().usingWatcher(this).forPath(event.getPath());
       // Do other stuff with the children znode.
    }
}

Every 11 seconds the code throws ConnectionLossException if connectionTimeout is set to 10 seconds. It seems the exception is connectionTimeout plus 1 second. Why?

I checked the source code found that GetChildrenBuilderImpl will call the CuratorZookeeperClient's blockUntilConnectedOrTimeout method which will check the connection state every 1 second.

2013-04-17 17:22:08 [ERROR]-[com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:97)] Connection timed out for connection string (...) and timeout (10000) / elapsed (10317913)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
    at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:413)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:213)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:202)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:198)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:190)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:37)
    at com.netflix.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:56)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
Was it helpful?

Solution

This was a known bug in the Curator/ZooKeeper interaction that's tracked under CURATOR-24 The current method of managing hung ZK handles needs improvement. It was fixed in 2.0.1-incubating version.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top