Question

After loading and grouping records, how can I store those grouped records into several files, one per group (=userid)?

records = LOAD 'input' AS (userid:int, ...);
grouped_records = GROUP records BY userid;

I'm using Apache Pig version 0.8.1-cdh3u3 (rexported)

Was it helpful?

Solution

 A = LOAD 'mydata' USING PigStorage() as (a, b, c);  
 STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 'bz2', '\\t');

Parameters:

  1. parentPathStr - Parent output dir path
  2. splitFieldIndex - key field index
  3. compression - 'bz2', 'bz', 'gz' or 'none'
  4. fieldDel - Output record field delimiter.

Reference: GrepCode

OTHER TIPS

Indeed, there is a MultiStorage class at Piggybank which does exactly what I want - it splits the records by a specified attribute (at index '0' in my example):

STORE records INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0', 'none', ',');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top