Domanda

I'm playing with HDInsight and what I don't understand is though Microsoft claim all the Data Nodes are running on CentOS and Java, you are still able to write Mapper/Reducer with .NET code, which is because of Hadoop Steaming. But it's quite unclear in these articles how .NET code can run on Linux (and I don't think Mono is involved here). Could somebody shed some light on how the .NET code eventually got run on each Data Node, or are they?

È stato utile?

Soluzione

The Data Nodes are not actually running CentOS. All the nodes in HDInsight are based on the Hortonworks Data Platform (HDP) for Windows. This means that any of your streaming programs are actually running on Windows when you're using HDInsight.

The article you refer to is talking (rather confusingly!) about an alternative pattern of setting up your own Hadoop on a series of Azure VMs as IaaS. HDInsight takes the need for that management overhead away (that's what you're paying for over the VM charges) and provides PaaS.

Of course there is nothing to stop you running streaming MapReduce and C# in mono on a linux based Hadoop, but your mileage may vary here.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top