Pig script code error?
-
16-10-2019 - |
Question
While running the below pig script I am getting an error in line4: If it is GROUP then I am getting error. If I change from 'GROUP' TO 'group' in line4, then the script is running.
What is the difference between group and GROUP?
LINES = LOAD '/user/cloudera/datapeople.csv' USING PigStorage(',') AS ( firstname:chararray, lastname:chararray, address:chararray, city:chararray, state:chararray, zip:chararray );
WORDS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(zip)) AS ZIPS;
WORDSGROUPED = GROUP WORDS BY ZIPS;
WORDBYCOUNT = FOREACH WORDSGROUPED GENERATE GROUP AS ZIPS, COUNT(WORDS);
WORDSSORT = ORDER WORDBYCOUNT BY $1 DESC;
DUMP WORDSSORT;
Solution
'group' in strictly lower case in the FOREACH is the thing you are looping/grouping over.
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/ says:
When you group a relation, the result is a new relation with two columns: “group” and the name of the original relation.
Column names are case sensitive, so you have to use lower-case 'group' in your FOREACH.
'GROUP' in upper case is the grouping operator. You can't mix them. So don't do that.
OTHER TIPS
Normally the GROUP/COGROUP is used to group the relation by some key.after you group the relation describe the grouped relation.you can find EX: describe grp; grp: {group: chararray,A: {(name: chararray,session: chararray,gpa: float)}}.
in the above result you can find "group".
if you want to perform some operation on grouped relation(grp) ,you should use the "group" not GROUP.