Pergunta

Is there a way to do this? eg, pass the name of the file to be processed, etc?

Foi útil?

Solução

This showed up in another question, but you can indicate the input parameter on the command line and use that when you are loading, for example:

Command Line:

pig -f script.pig -param input=somefile.txt

script.pig:

raw = LOAD '$input' AS (...);

Note that if you are using the Amazon Web Services Elastic Map Reduce then the '$input' is what is passed to the script for any input you provide.

Outras dicas

You can use ...
1. if there are few parameters then use -param (-p)
2. if there are lot of parameters then use -param_file (-m)

You can use either approach depending on the nature of value of your command line arguments, I use -param when i am developing and testing my scripts. Once pig script is ready for batch processing or running thru crontab, I use -param_file so that if any change required, I can easily update the params.init file.

man pig will show you all available options.

-m, -param_file path to the parameter file
-p, -param key value pair of the form param=val

Here is sample code ...

students.txt (input data)

001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar
006,Archana,Mishra,23,9848022335,Chennai
007,Komal,Nayak,24,9848022334,trivendram
008,Bharathi,Nambiayar,24,9848022333,Chennai

params.init (file to hold all parameters)

fileName='hdfs://horton/user/jgosalia/students.txt'
cityName='Chennai'

filter.pig

students = LOAD '$fileName' USING PigStorage(',') AS (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);
students = FILTER students BY city == '$cityName';
DUMP students;

OPT 1: Using params on command line (-param or -p) & Output

pig -param fileName='hdfs://horton/user/jgosalia/students.txt' -param cityName='Chennai' filter.pig

... Trimming the logs ...

(6,Archana,Mishra,23,9848022335,Chennai)
(8,Bharathi,Nambiayar,24,9848022333,Chennai)

OPT 2: Using params file on command line (-param_file or -m) & Output

pig -param_file params.init filter.pig

... Trimming the logs ...

(6,Archana,Mishra,23,9848022335,Chennai)
(8,Bharathi,Nambiayar,24,9848022333,Chennai)

NOTE: use absolute path for file paths (both as parameters and when giving param file path to -param_file (-m)).

It is simple to pass in parameters to a PIG script.

First mark your variables in pig using '$' for example $input_file. Then pass the parameters to you script using pig -params input_file='/path/to/data'

for more information look here: http://wiki.apache.org/pig/ParameterSubstitution

Yes.

You can pass parameters along commandline options using pig's param option.

--customparam.pig
--load hdfs/local fs data 
 original = load '$input' using PigStorage('$delimiter');
--filter a specific field value into another bag  
 filtered = foreach original generate $split; 
--storing data into hdfs/local fs 
  store filtered into '$output';

pig -x local -f customparam.pig -param input=Pig.csv -param output=OUT/pig -param delimiter="," -param split='$1'

For more info: check this

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top