Domanda

Assuming default Hadoop settings,if I am writing a file of 128 MB size into HDFS. There would be 2 blocks that the client needs to write to HDFS. So my questions around this is

  1. Will the file be available for someone to read as soon as the first block is written to HDFS or it has to wait for second block to be written as well?

Second scenario with 64 MB file size,
2. Can some one read the block which is currently written to the HDFS? or someone has to wait for the write to complete.

È stato utile?

Soluzione

HDFS thinks in terms of blocks .
So , if your file is made of 2 blocks, and your one block is written, you can read that block . But, since its just a block of the file and not the whole file, you will have to search it in the dfs.data.dir and use hadoop dfs -text to read it. Or you could go to namenode UI to read it.

for second question - No, you cant read the block that is currently being written . It wont be visible to the readers.

Altri suggerimenti

In general when you write to HDFS, once more than a block's worth data has been written, the first block will be visible to the new readers.This is true for subsequent blocks also.It is always current block being written, that is not visible to other readers. However you can use FSDataOutputStream.sync() which forces all buffers to be synchronized to the data nodes.After sync() returns success, data written upto that point is guaranteed to visible to all new readers.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top