Need help in finding the location of the replicated file on my hdfs cluster

https://stackoverflow.com/questions/17110396

31-05-2022
|

質問

I am using Webhdfs for storing a file in my hdfs cluster. In the conf files I have mentioned the replication factor as 2. Using the Webhdfs api, the first PUT request tells us location of the datanode to put the file on and using the address in the response we put the actual file in the datanode using the 2nd PUT request. Now since we have mentioned that the replication factor is 2, the file will get replicated to another datanode and we know the location of the 1 of the 2 files, is it possible to get the location of the 2nd file, as to which datanode it is located on? Thanks in advance

解決

first of all, files in HDFS are not stored as a whole. Rather they are chopped into blocks and these blocks are stored in a replicated manner across the cluster. So your question should be how to find the location of the second replica of a block and not the file.

You can point your web browser to namenode_machine:50070, the HDFS webUI. Click on Browse the filesystem and move to the file in question. Once you click on this file a new page will be opened. Scroll down to :

Total number of blocks: 1
-4906713039323389639:       127.0.0.1:50010

This shows you all the block of this file and the machine where they are placed.

HTH

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow