Question

I want to load a text file into pig and then store it as rc file for this I found that twitter has provided a storage udf in this link

http://grepcode.com/file/repo1.maven.org/maven2/com.twitter.elephantbird/elephant-bird-rcfile/3.0.8/com/twitter/elephantbird/pig/store/RCFilePigStorage.java

Can someone tell me how to compile it and use it in my pig code?

Was it helpful?

Solution

Include all the twitter dependencies and the pig jars and compile the RCFilePigStorage.java. If you want to change some specific behavior in the code, do the changes also and can rename it to MyRCFilePigStorage.java.

Now take the class files generated after compiling and create a jar file named MyRCUdf.jar. Register this jar in your pigscript.

Register MyRCUdf.jar;
* your pig logic*
Store 'data' using MyRCFilePigStorage();

EDIT:Consider the following links for twitter dependencies. Take the source code, compile and include the classes generated in your classpath

https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java

https://github.com/kevinweil/elephant-bird

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top