Frage

I have a file as:

1,Mary,5
1,Tom,5
2,Bill,5
2,Sue,4
2,Theo,5
3,Mary,5
3,Cindy,5
4,Andrew,4
4,Katie,4
4,Scott,5
5,Jeff,3
5,Sara,4
5,Ryan,5
6,Bob,5
6,Autumn,4
7,Betty,5
7,Janet,5
7,Scott,5
8,Andrew,4
8,Katie,4
8,Scott,5
9,Mary,5
9,Tom,5
10,Bill,5
10,Sue,4
10,Theo,5
11,Mary,5
11,Cindy,5
12,Andrew,4
12,Katie,4
12,Scott,5
13,Jeff,3
13,Sara,4
13,Ryan,5
14,Bob,5
14,Autumn,4
15,Betty,5
15,Janet,5
15,Scott,5
16,Andrew,4
16,Katie,4
16,Scott,5 

I want the answer with names most appeared i.e max (Scott,6)

War es hilfreich?

Lösung

There's some ambiguity in your question.

What exactly do you want.

Do you want a list of user count in descending order?

OR

Do you want just (scott,6) i.e. only one user with maximum count?

I have successfully solved both the things,on the sample data which you gave.

If the question is of first type then,

a = load '/file.txt' using PigStorage(',') as (id:int,name:chararray,number:int);
g = group a by name;
g1 = foreach g{ 
      generate group as g , COUNT(a) as cnt;
}; 
toptemp  = group g1 all; 
final = foreach toptemp{
        sorted = order g1 by cnt desc;
        GENERATE flatten(sorted);
};

This will give you a list of users in descending order as,

(Scott,6)
(Katie,4)
(Andrew,4)
(Mary,4)
(Bob,2)
(Sue,2)
(Tom,2)
(Bill,2)
(Jeff,2)
(Ryan,2)
(Sara,2)
(Theo,2)
(Betty,2)
(Cindy,2)
(Janet,2)
(Autumn,2)

If the question is of second type then,

a = load '/file.txt' using PigStorage(',') as (id:int,name:chararray,number:int);
g = group a by name;
g1 = foreach g{ 
      generate group as g , COUNT(a) as cnt;
}; 
toptemp  = group g1 all; 
final = foreach toptemp{
        sorted = order g1 by cnt desc;
        top = limit sorted 1;     
        GENERATE flatten(top);
};

This gives us only one result ,

(Scott,6)

Thanks.I Hope it helps.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top