문제

As the final step on some computations with Scalding I want to compute several averages of the columns in a pipe. But the following code doesn't work

myPipe.groupAll { _average('col1,'col2, 'col3) }

Is there any way to compute such functions sum, max, average without doing several passes? I'm concerned about performance but maybe Scalding is smart enough to detect that programmatically.

도움이 되었습니까?

해결책

This question was answered in the cascading-user forum. Leaving an answer here as a reference

myPipe.groupAll { _.average('col1).average('col2).average('col3) }

다른 팁

you can do size (aka count), average, and standardDev in one go using the function below.

// Find the count of boys vs. girls, their mean age and standard deviation. 
// The new pipe contains "sex", "count", "meanAge" and "stdevAge" fields.
val demographics = people.groupBy('sex) { _.sizeAveStdev('age -> ('count, 'meanAge, 'stdevAge) ) }

finding max would require another pass though.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top