Вопрос

I will be doing PIG transformation daily (new data everyday). And I need to generate Unique key for data pulled everyday. what would be best approach ? If I perform does rank for tomarow will overwrite today rank ?

Это было полезно?

Решение

Your ranking will start at 1 each time you kick it off. If you want to generate unique data per day, I would recommend using the datafu hash method on concat(rank + date). You'll end up with a unique hash that can be used as a surrogate key.

REGISTER datafu-1.2.0.jar
DEFINE SHA datafu.pig.hash.SHA();

S1 = LOAD 'surrogate_hash' USING PigStorage('|') AS (c1:chararray,date:chararray,c3:chararray);
S2 = RANK S1;
S3 = FOREACH S2 GENERATE SHA((chararray)CONCAT((chararray)rank_S1,date)),c1,date,c3;

dump S3;
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top