Domanda

I have C# jobs running on hadoop cluster hosted by Microsoft Azure HDInsight services. I had to use the hadoop command line directly in my hdinsight server in order to use my custom Java input format :

call bin\hadoop jar lib\hadoop-streaming.jar -D "mapred.max.split.size=33554432" -libjars "../mycustom-hadoop-streaming.jar" -inputformat "mycustom.hadoop.CombinedInputFormat" ...(I cut the rest of the command)

Now I am trying to do same with Job submission through powershell command line (remote job submission from another azure machine) :

$jobDefinition = New-AzureHDInsightStreamingMapReduceJobDefinition -Defines @{ "mapred.max.split.size"="33554432", "mapred.input.format.class"="mycustom.hadoop.CombinedInputFormat" } ... (I cut the rest of the command)

But where is the way to define -libjars with powershell command line ? It seems that Microsoft didn't think about that capability : http://msdn.microsoft.com/en-us/library/windowsazure/dn527638.aspx

Does anybody tried to perform that or have a workaround to define libjars with HDInsight streaming job submission ?

È stato utile?

Soluzione

As you may know, HDInsight PowerShell and .Net SDK uses WebHcat/Templeton REST API and I believe, the reason New-AzureHDInsightStreamingMapReduceJobDefinition doesn't have -libjars as a paremeter is, Templeton REST API does not have that or support that, as shown in the apache templeton doc here- http://people.apache.org/~thejas/templeton_doc_latest/mapreducestreaming.html

On the other hand, Templeton REST API for MapReduce/JAR supports libjars http://people.apache.org/~thejas/templeton_doc_latest/mapreducejar.html

and accordingly, corresponding HDInsight cmdlet New-AzureHDInsightMapReduceJobDefinition has a -Libjars parameter.

I hope it helps to explain!

Azim(MSFT)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top