How to store grouped records into multiple files with Pig?
-
26-10-2019 - |
Question
After loading and grouping records, how can I store those grouped records into several files, one per group (=userid)?
records = LOAD 'input' AS (userid:int, ...);
grouped_records = GROUP records BY userid;
I'm using Apache Pig version 0.8.1-cdh3u3 (rexported)
Solution
A = LOAD 'mydata' USING PigStorage() as (a, b, c);
STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 'bz2', '\\t');
Parameters:
- parentPathStr - Parent output dir path
- splitFieldIndex - key field index
- compression - 'bz2', 'bz', 'gz' or 'none'
- fieldDel - Output record field delimiter.
Reference: GrepCode
OTHER TIPS
Indeed, there is a MultiStorage class at Piggybank which does exactly what I want - it splits the records by a specified attribute (at index '0' in my example):
STORE records INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0', 'none', ',');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow