Question

I am confused because I see the word 'Cluster' being used in multiple scenarios. Which of these below is the proper use of 'Cluster'?

  1. All the NameNodes, Job Trackers and Data Nodes
  2. One set of NameNode, JobTracker, and DataNodes (there being many more such sets)

Edit: An example of such confusing line is at this page: DistCp (distributed copy) is a tool used for large inter/intra-cluster copying

If all the servers(nodes) together is a cluster, then where is the question of 'inter cluster copying'? If a subset of servers is a cluster, then what exactly is that subset?

Was it helpful?

Solution

A Hadoop cluster is a combination of Master & Slave nodes, the Master nodes being NameNode, Secondary NameNode and Job Tracker & the Slave nodes being DataNodes and TaskTrackers. In any hadoop cluster, there is 1 instance of the Master nodes, i.e 1 NameNode, 1 Secondary NameNode and 1 Job Tracker and multiple slave nodes (dataNodes and taskTrackers)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top