Frage

This is my first post on StackOverflow, so pardon me in advance if this is too lengthy and/or vague.

I have a Pig relation as follows:

my_relation = LOAD '/path/to/data' USING PigStorage(',') 
              AS (f1:chararray, f2:chararray, f3:chararray);

Now, I wanted to filter out the field 'f3' from the above relation 'my_relation'. I know I could do it like so:

my_new_relation = FOREACH my_relation GENERATE my_relation.f1, my_relation.f2;

The problem with this method comes when I have a large number of fields/tuples in the 'my_relation' relation. Say, my_relation had 900 fields/tuples, and I wanted all of them but for one. Now, with the above method, I'd have to list out 899 fields/tuples after my 'GENERATE' keyword!

My question: Is there an easy way to filter out a handful of fields/tuples from a relation in PigLatin?

Prior: My prior on Apache Pig and PigLatin in general is very weak (as can be told by the difficulty of this question). I'm still reading through the Pig documentation found here.

Thanks for reading this question! Any/all help is appreciated!

War es hilfreich?

Lösung

First of all, your syntax is not quite right. If you wanted to just save the fields f1 and f2, you would do it like this:

my_new_relation = FOREACH my_relation GENERATE f1, f2;

As to your question, you can use a project-range expression:

my_new_relation = FOREACH my_relation GENERATE f1 .. f345, f347 .. f900;

Andere Tipps

Also you can write your own loader where you can specify which all columns you want to load.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top