
I've this hadoop project that someone else coded (link). I've the source. I want to implement this on my cluster (basically 3 ubuntu machines). But the mention project works on an EC2 platform (with Cloudera distribution).

So, what all should I install on my systems to make it have the software for running such a project?

I thought about Cloudera Manager, Oracle Java.

Was it helpful?


If the project works with cloudera distribution (not with EMR), you can install cloudera and it should be fine. Only corner I can expect as problematic - if s3 was used as a file system.
If the project indeed works against s3 you have two ways:
a) Try to replace s3 to hdfs and all file names / paths, and it should also work fine (if they are hardcoded).
b) Install OpenStack's Swift which is open source alternative to S3 and then try to run Hadoop over it. Disclosure: I am involved in project of running hadoop over Swift.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top