Question

I have seen that there is Microsoft .NET SDK For Hadoop. I found that Map/Reduce programs can now be written in .NET for HDInsight.
Is there a way we can write Hive UDFs also for HDInsight?

Was it helpful?

Solution

You can use the same streaming method you would with a python UDF to run a .NET program as a UDF.

For example, if you have a .NET program which does something to STDIN and writes a result to STDOUT you can run it using a Hive UDF as follows:

SELECT TRANSFORM (<columns>)
USING '<PROGRAM.EXE>'
AS (<columns>)
FROM <table>;

Note that you can also use multiple columns in your UDF by using comma-separated data, both in and out of the .NET piece.

As far as performance goes, you might find this is really slow, so be careful about overuse, and keep an eye on it.

Also, don't forget to add the files for program.exe to your hive job before running the query.

add FILE 'wasb://...PROGRAM.EXE';
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top