I will be doing PIG transformation daily (new data everyday). And I need to generate Unique key for data pulled everyday. what would be best approach ? If I perform does rank for tomarow will overwrite today rank ?

有帮助吗?

解决方案

Your ranking will start at 1 each time you kick it off. If you want to generate unique data per day, I would recommend using the datafu hash method on concat(rank + date). You'll end up with a unique hash that can be used as a surrogate key.

REGISTER datafu-1.2.0.jar
DEFINE SHA datafu.pig.hash.SHA();

S1 = LOAD 'surrogate_hash' USING PigStorage('|') AS (c1:chararray,date:chararray,c3:chararray);
S2 = RANK S1;
S3 = FOREACH S2 GENERATE SHA((chararray)CONCAT((chararray)rank_S1,date)),c1,date,c3;

dump S3;
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top