Map Reduce : Which is the underlying Data Structure used

Question 1

Thanks to all of you

I got the answer of my question. The underlying HDFS uses block as a storing units a detail description of which is mentioned in the following book and networking streaming concepts.

All the details are available in the third chapter of Hadoop: The Definitive Guide.

Question 2

HDFS is the default underlying storage platform of Hadoop. Its like any other file system in the sense that - it does not care what structure the files have. It only ensures that files will be saved in a redundant fashion and available for retrieval quickly.

So it is totally upto you the user, to store files with whatever structure you like inside them.

A Map Reduce program simply gets the file data fed to it as an input. Not necessarily the entire file, but parts of it depending on InputFormats etc. The Map program then can make use of the data in whatever way it wants to.

'Hive' - on the other hand deals with TABLES (columns/rows). And you can query them in a SQL like fashion using Hive-QL.