Question

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.name), FLATTEN(DB.population);
}

The problem is that I get the name of the city 5 times instead of 1. I get something like:

(ALASKA,M,27257)
(ALASKA,M,23696)
(ALASKA,M,19949)
(ALASKA,M,19926)
(ALASKA,M,19833)
(ALASKA,H,27257)
(ALASKA,H,23696)
(ALASKA,H,19949)
(ALASKA,H,19926)
(ALASKA,H,19833)

And the output I need is:

(ALASKA,M,27257)
(ALASKA,H,23696)
Was it helpful?

Solution

2 flattens: FLATTEN(DB.name), FLATTEN(DB.population); cause a Cartezian product between 2 bags, replace it with one

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.(name, population));
}

Or as the bags created by the GROUP BY carry all of the original tuples with all of the columns you can do this:

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(DB);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top