Question

In two separate occasions, I've had to rename all the fields in a Pipe to join (using Merge or CoGroup). What I have done recently is:

//These two pipes contain similar values but different Field Names
Pipe papa = new Retain(papa, fieldsFrom);
Pipe pepe = new Retain(pepe, fieldsTo);

//Where fieldsFrom.size() == fieldsTo.size() and the fields positions match
for (int i =0; i < fieldsFrom.size(); i++){

    pepe = new Rename(pepe, fieldsFrom.select(new Fields(i)), 
                            fieldsTo.select(new Fields(i)));

}

//this allows me to do this
Pipe retVal = new Merge(papa, pepe);

Obviously this is pretty fragile since I need to ensure field positions in FieldsFrom and FieldsTo remain constant and that they are the same size etc.

Is there a better - less fragile way to merge without going through all the ceremony above?

Was it helpful?

Solution

You can eliminate some ceremony by utilizing Rename's ability to handle aligned from/to fields like this:

pepe = new Rename(pepe, fieldsFrom, fieldsTo);

But this only eliminates the for loop; yes, you must ensure fieldsFrom and fieldsTo are the same size and aligned to correctly express the rename.

cascading.jruby addresses this by wrapping renaming in a function that accepts a mapping rather than aligned from/to fields.

It is also the case that Merge requires incoming pipes to declare the same fields, but CoGroup only requires that you provide declaredFields to ensure there are no name collisions on the output (all fields propagate through, even grouping keys from all inputs).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top