Question

I am wondering that if such a large datasets are used in Hadoop Map Reduce then what are the data structures which are used by hadoop. If possible please somebody provide me a detail view of underlying data structures in hadoop.

Was it helpful?

Solution 2

Thanks to all of you

I got the answer of my question. The underlying HDFS uses block as a storing units a detail description of which is mentioned in the following book and networking streaming concepts.

All the details are available in the third chapter of Hadoop: The Definitive Guide.

OTHER TIPS

HDFS is the default underlying storage platform of Hadoop. Its like any other file system in the sense that - it does not care what structure the files have. It only ensures that files will be saved in a redundant fashion and available for retrieval quickly.

So it is totally upto you the user, to store files with whatever structure you like inside them.

A Map Reduce program simply gets the file data fed to it as an input. Not necessarily the entire file, but parts of it depending on InputFormats etc. The Map program then can make use of the data in whatever way it wants to.

'Hive' - on the other hand deals with TABLES (columns/rows). And you can query them in a SQL like fashion using Hive-QL.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top