문제

I will be doing PIG transformation daily (new data everyday). And I need to generate Unique key for data pulled everyday. what would be best approach ? If I perform does rank for tomarow will overwrite today rank ?

도움이 되었습니까?

해결책

Your ranking will start at 1 each time you kick it off. If you want to generate unique data per day, I would recommend using the datafu hash method on concat(rank + date). You'll end up with a unique hash that can be used as a surrogate key.

REGISTER datafu-1.2.0.jar
DEFINE SHA datafu.pig.hash.SHA();

S1 = LOAD 'surrogate_hash' USING PigStorage('|') AS (c1:chararray,date:chararray,c3:chararray);
S2 = RANK S1;
S3 = FOREACH S2 GENERATE SHA((chararray)CONCAT((chararray)rank_S1,date)),c1,date,c3;

dump S3;
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top