Domanda

I am using Hadoop to copy files between HDFS that are located in distant hosts. My problem is that the network between these hosts has high latency (> 1 sec), and sometimes Hadoop launches an error of java.net.NoRouteToHostException: No route to host;.

I think this problem happens because of the latency. The host is reachable using ping but with a little delay. Here is an example of a ping. In the beginning it couldn't reach the target hosts, but then it did.

WorkGroup4-0:~# ping WorkGroup1-4ping: unknown host WorkGroup1-4
WorkGroup4-0:~# ping WorkGroup1-1
PING WorkGroup1-1 (172.16.100.2) 56(84) bytes of data.
From WorkGroup4-0 (172.16.100.13) icmp_seq=1 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=2 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=3 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=4 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=5 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=6 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=7 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=8 Destination Host Unreachable
From WorkGroup4-0 (172.16.100.13) icmp_seq=9 Destination Host Unreachable
64 bytes from WorkGroup1-1 (172.16.100.2): icmp_req=12 ttl=64 time=1036 ms
64 bytes from WorkGroup1-1 (172.16.100.2): icmp_req=15 ttl=64 time=996 ms
^C
--- WorkGroup1-1 ping statistics ---
24 packets transmitted, 2 received, +9 errors, 91% packet loss, time 23134ms
rtt min/avg/max/mdev = 996.201/1016.462/1036.724/20.286 ms, pipe 3

Is there a way to configure the JVM for networks with high latency so that the time to try to connect to a remote host is longer?

Nessuna soluzione corretta

Altri suggerimenti

What a mess... But ok, here is a short list of things to test:

  • dfs.client.failover.connection.retries.on.timeouts, default 0, between 2 and 5
  • dfs.client.failover.connection.retries, default 0, between 2 and 5
  • dfs.client.failover.max.attempts, default 15, more than 15, less than 50

If there are latency inside your Hadoop cluster too, consider the Rack Awarness feature and assign a unique rackId on each node, this will tell to Hadoop that all your nodes are distant.

More infos here: http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top