Question

I have two Hadoop clusters and both are running the same Hadoop version. I also have a user "testuser" (example) in both clusters (so testuser keytabs is present in both).

Namenode#1 (source cluster): hdfs://nn1:8020
Namenode#2 (dest cluster): hdfs://nn2:8020

I want to copy some files from one cluster to another using hadoop distcp. Example: in source cluster I have a file with path "/user/testuser/temp/file-r-0000" and in destination cluster, the destination directory is "/user/testuser/dest/". So what I want is to copy file-r-0000 from source cluster to target cluster's "dest" directory.

I have tried these so far:

hadoop distcp hdfs://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest

hadoop distcp hftp://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest

I believe I do not need to use "hftp://" since I have same version of hadoop. Again, I also tried those in both cluster, but all I'm getting are some exceptions related to security.

When running from destination cluster with hftp:

14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
14/02/26 00:04:45 INFO fs.FileSystem: Couldn't get a delegation token from nn1ipaddress:8020

When running from source cluster:

14/02/26 00:05:43 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm1 cause:java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Call to nn1ipaddress failed on local exception: java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2


Caused by: java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:560)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:513)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:616)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:203)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1254)
    at org.apache.hadoop.ipc.Client.call(Client.java:1098)
    ... 26 more

It also shows me host address is not present in kerberos database (I don't have the exact log for that)

So, do I need to configure kerberos in a different way in order to use discp between them? Or am i missing something here?

Any information will be highly appreciated. Thanks in advance.

Was it helpful?

Solution

Cross-realm authentication is required to use distcp between two secured cluster. It was not configured in those two clusters. After setting up cross-realm authentication correctly, it worked.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top