Domanda

I have 2 questions for A hadoop as a storage system.

  1. I have a hadoop cluster of 3 data node and I want to direct splits of a huge file say of size 128mb (assuming that split size is 64mb ) to my choice of data node. That is how to control which split goes to which DataNode in such case. I mean lets say we have 3 data node( ie D1,D2,D3) and we want particular split (let say 'A') which I wish it to move to particular data node let it be D2.

    How can we do this ?

  2. What is the smallest possible split size of a hadoop filesystem. How can we configure it to smallest split size.

È stato utile?

Soluzione

1) You can't control where the data blocks are placed

2) As small as you want (should probably be a multiple of 1024 bytes though but i don't think there is an actual constraint in this), but on modern hardware, anything smaller than 64 / 128 MB is inefficient (you can specify a smaller processing split size if you are doing anything CPU intensive in the MR Job)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top