Question

While searching python client for Hadoop, I found two modules pydoop and hadoopy. It seems both are good enough to work with, but not sure which one has more advantages than the other to install one.

Was it helpful?

Solution

The most comprehensive documentation of this I think is http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/

Recently, I really think that mrjob has come out ahead as a clear frontrunner. It has a very active mailing list and it seems to be relatively stable and up to date. It also has nice integration with Amazon EMR.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top