Question

I have a file as:

1,Mary,5
1,Tom,5
2,Bill,5
2,Sue,4
2,Theo,5
3,Mary,5
3,Cindy,5
4,Andrew,4
4,Katie,4
4,Scott,5
5,Jeff,3
5,Sara,4
5,Ryan,5
6,Bob,5
6,Autumn,4
7,Betty,5
7,Janet,5
7,Scott,5
8,Andrew,4
8,Katie,4
8,Scott,5
9,Mary,5
9,Tom,5
10,Bill,5
10,Sue,4
10,Theo,5
11,Mary,5
11,Cindy,5
12,Andrew,4
12,Katie,4
12,Scott,5
13,Jeff,3
13,Sara,4
13,Ryan,5
14,Bob,5
14,Autumn,4
15,Betty,5
15,Janet,5
15,Scott,5
16,Andrew,4
16,Katie,4
16,Scott,5 

I want the answer with names most appeared i.e max (Scott,6)

Was it helpful?

Solution

There's some ambiguity in your question.

What exactly do you want.

Do you want a list of user count in descending order?

OR

Do you want just (scott,6) i.e. only one user with maximum count?

I have successfully solved both the things,on the sample data which you gave.

If the question is of first type then,

a = load '/file.txt' using PigStorage(',') as (id:int,name:chararray,number:int);
g = group a by name;
g1 = foreach g{ 
      generate group as g , COUNT(a) as cnt;
}; 
toptemp  = group g1 all; 
final = foreach toptemp{
        sorted = order g1 by cnt desc;
        GENERATE flatten(sorted);
};

This will give you a list of users in descending order as,

(Scott,6)
(Katie,4)
(Andrew,4)
(Mary,4)
(Bob,2)
(Sue,2)
(Tom,2)
(Bill,2)
(Jeff,2)
(Ryan,2)
(Sara,2)
(Theo,2)
(Betty,2)
(Cindy,2)
(Janet,2)
(Autumn,2)

If the question is of second type then,

a = load '/file.txt' using PigStorage(',') as (id:int,name:chararray,number:int);
g = group a by name;
g1 = foreach g{ 
      generate group as g , COUNT(a) as cnt;
}; 
toptemp  = group g1 all; 
final = foreach toptemp{
        sorted = order g1 by cnt desc;
        top = limit sorted 1;     
        GENERATE flatten(top);
};

This gives us only one result ,

(Scott,6)

Thanks.I Hope it helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top