Question

We have profiling code that collections durations of methods along with a bunch of other data points, and we store those numbers inside a SummaryStatistics object from commons math to provide the min, max, mean, count etc. However we need to flush this object to disk every hour or so and start collecting again for the next one.

My question is how can we reliably add these values together, so if we have 24 summary statistics objects we can display the summary for the entire day without skewing the data? The objects themselves have the running averages as well as how many items were counted, so is there a utility class that will allow two weighted averages to be combined?

Was it helpful?

Solution

You can also do this directly, using AggregateSummaryStatistics. See the section titled "Compute statistics for multiple samples and overall statistics concurrently" In the statistics section of the Commons Math User Guide.

OTHER TIPS

Since you say you have both the mean and the count, the general formula you want to use is to sum the product of the means by their count and then divide that by the sum of their counts.

E.g., for two SummaryStatistics objects A and B, you would use:

double weightedMean = (A.getMean() * A.getN() + B.getMean() * B.getN()) /
                      (A.getN() + B.getN());

For many of them (e.g., a List of them called `manyStats') you might do something like:

double accum = 0.0;
long n = 0;
for (SummaryStatisics s: manyStats) {
  accum += s.getMean() * s.getN();
  n += s.getN();
}
double weightedMean = accum / n;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top