This question was answered in the cascading-user forum. Leaving an answer here as a reference
myPipe.groupAll { _.average('col1).average('col2).average('col3) }
Domanda
As the final step on some computations with Scalding I want to compute several averages of the columns in a pipe. But the following code doesn't work
myPipe.groupAll { _average('col1,'col2, 'col3) }
Is there any way to compute such functions sum, max, average
without doing several passes? I'm concerned about performance but maybe Scalding is smart enough to detect that programmatically.
Soluzione
This question was answered in the cascading-user forum. Leaving an answer here as a reference
myPipe.groupAll { _.average('col1).average('col2).average('col3) }
Altri suggerimenti
you can do size (aka count), average, and standardDev in one go using the function below.
// Find the count of boys vs. girls, their mean age and standard deviation.
// The new pipe contains "sex", "count", "meanAge" and "stdevAge" fields.
val demographics = people.groupBy('sex) { _.sizeAveStdev('age -> ('count, 'meanAge, 'stdevAge) ) }
finding max would require another pass though.