Question

I'm setting up a Hadoop cluster on the Google Compute Engine. But I'm having trouble understanding how the cluster instance would work. I've searched a lot already but nothing had given a clear answer.

When setting up a Hadoop cluster, do you create an image of one instance which has Hadoop installed and every time you boot this instance (image) you create the other instances (nodes)?

But this approach would require the nodes would have to get configured every time you create them?

Has anyone experience with this and could provide some documentation, about the setting up part of an Hadoop cluster on the Google Compute Engine?

Was it helpful?

Solution

The latest information about running Hadoop on the Google Cloud Platform can be found here: https://developers.google.com/hadoop/

There you can find Hadoop setup scripts that allow you to quickly spin up a Hadoop Cluster based on configuration details that you specify. The setup script includes the option to use Hadoop Distributed File System (HDFS) as the default file system or Google Cloud Storage - which provides additional functionality like automatic capacity scaling and persisting your data beyond the life of your Hadoop cluster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top