What cloud provider use for implementing simple paralell algorithm?

https://stackoverflow.com/questions/14277589

15-01-2022
|

Question

I have a task: speed up current implementation of inverted index. In my opinion the best approach is to run it in the cloud:

Divide the input text for a few parts (or just grab a few different text files)
Send texts to nodes
Run the algorithm on each node for different input data
Collect the results and merge them

My question is: what is the easiest way to implement it?

My current ideas are:

Windows Azure with worker roles - is it possible to send different data to nodes and later on merge them?
Windows Azure and HPC Scheduler - isn't it too powerful for a task like this? I am afraid of configuration and costs (new node = new worker role?)
Use any other cloud, like Amazon or Google - I'd like to code in c#, and I am familiar with Microsoft technologies, so I am a little afraid of them

Please give me any advices how would you achieve this goal, I am new to cloud computing (although I have some basics like mpi, soa, cuda, azure basics)

Solution

This is a case for MapReduce.

In fact, Hadoop was created out of the needs of Nutch (which does Inverted Index)

You could either use:

a) Amazon's Elastic MapReduce

b) Signup for HDInsights on Azure

There are other providers (picloud is one which comes to mind)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow