Parallel processing of small functions in the cloud

https://stackoverflow.com/questions/11637994

cloud
parallel-processing
hadoop
starcluster

22-06-2021
|

Question

I'm having a few million/billion (10^9) data-input-sets, that need to be processed. They are quiet small < 1kB. And they need about 1 second to be processed.

I have read a lot about Apache Hadoop, Map Reduce and StarCluster. But I am not sure what the most efficient and fastest way is, to process it?

I am thinking of using Amazon EC2 or a similar cloud service.

Solution

You might consider something like Amazon EMR which takes care of a lot of the plumbing with Hadoop. If your just looking to code something quickly, hadoop streaming, hive and PIG are all good tools for getting started with hadoop w/out requring you to know all of the ins and outs of MapReduce.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow