سؤال

As the final step on some computations with Scalding I want to compute several averages of the columns in a pipe. But the following code doesn't work

myPipe.groupAll { _average('col1,'col2, 'col3) }

Is there any way to compute such functions sum, max, average without doing several passes? I'm concerned about performance but maybe Scalding is smart enough to detect that programmatically.

هل كانت مفيدة؟

المحلول

This question was answered in the cascading-user forum. Leaving an answer here as a reference

myPipe.groupAll { _.average('col1).average('col2).average('col3) }

نصائح أخرى

you can do size (aka count), average, and standardDev in one go using the function below.

// Find the count of boys vs. girls, their mean age and standard deviation. 
// The new pipe contains "sex", "count", "meanAge" and "stdevAge" fields.
val demographics = people.groupBy('sex) { _.sizeAveStdev('age -> ('count, 'meanAge, 'stdevAge) ) }

finding max would require another pass though.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top