Question

I've this hadoop project that someone else coded (link). I've the source. I want to implement this on my cluster (basically 3 ubuntu machines). But the mention project works on an EC2 platform (with Cloudera distribution).

So, what all should I install on my systems to make it have the software for running such a project?

I thought about Cloudera Manager, Oracle Java.

Was it helpful?

Solution

If the project works with cloudera distribution (not with EMR), you can install cloudera and it should be fine. Only corner I can expect as problematic - if s3 was used as a file system.
If the project indeed works against s3 you have two ways:
a) Try to replace s3 to hdfs and all file names / paths, and it should also work fine (if they are hardcoded).
b) Install OpenStack's Swift which is open source alternative to S3 and then try to run Hadoop over it. Disclosure: I am involved in project of running hadoop over Swift. https://github.com/Dazo-org/swift

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top