Question

just started using HDInsight.

I want to register an UDF in pig grunt mode but either thats not working or I don't know where to put the jar files in order for pig to load it.

right now what I did is to put the jar file in the lib folder (C:\apps\dist\pig-0.9.3-SNAPSHOT\lib) and in the pig root folder (C:\apps\dist\pig-0.9.3-SNAPSHOT) and nothing works for me , just keep getting this :

REGISTER elephant-bird-pig-3.0.0.jar;

and the response is :

2013-10-27 09:28:53,466 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: Local file 'elephant-bird-pig-3.0.0.jar' does not exist. Details at logfile: C:\apps\dist\hadoop-1.1.0-SNAPSHOT\logs\pig_1382864851131.log

please let me know where and how should I register this UDF.

Thank you

Was it helpful?

Solution

You shouldn't put them in the BIN folder for long, because if the node gets reimaged you may lose the files. It is better to put them into Windows Azure storage.

  1. Copy the extra .jar for your UDF onto Windows Azure Storage Blobs... WASB.

    • You could make a "PigExtras" folder for example. Your source locations will vary, and your destination locations will be your container and account.

    • Upload via hadoop command line:

    hadoop fs -copyFromLocal C:\files\MyUDF.jar wasb://container@account.blob.core.windows.net/PigExtras/MyUDF.jar

    hadoop fs -copyFromLocal c:\apps\dist\pig-0.11.0.1.3.1.0-06\piggybank.jar wasb://container@account.blob.core.windows.net/PigExtras/piggybank.jar

  2. Reference the .jar location in your Pig latin scripts. I believe a wildcard can be used to load all jars from a certain folder, but that may be inefficient if there are many in there.

    REGISTER wasb:///PigExtras/*.jar; myset = load 'wasb://container@account.blob.core.windows.net/data/file.txt' using MyUDF(); dump myset

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top