Question

I'm trying to figure out the best way to perform a computation fast and wanted to find out what sort of approach people would usually take in a situation like this.

I have a List of objects which have properties that I want to compute the mean and standard deviation of. I thought using this Math.NET library would probably be easier/optimised for performance.

Unfortunately, the input arguments for these functions are arrays. Is my only solution to write my own function to compute means and STDs? Could I write some sort of extension method for lists that uses lambda functions like here? Or am I better off writing functions that return arrays of my object properties and use these with Math.NET.

Presumably the answer depends on some things like the size of the list? Let's say for argument's sake that the list has 50 elements. My concern is purely performance.

Was it helpful?

Solution

ArrayStatistics indeed expects arrays as it is optimized for this special case (that's why it is called ArrayStatistics). Similarly, StreamingStatistics is optimized for IEnumerable sequence streaming without keeping data in memory. The general class that works with all kind of input is the Statistics class.

Have you verified that simply using LINQ and StreamingStatistics is not fast enough in your use case? Computing these statistics for a list of merely 50 entries is barely measurable at all, unless say you do that a million times in a loop.

Example with Math.NET Numerics v3.0.0-alpha7, using Tuples in a list to emulate your custom types:

using MathNet.Numerics.Statistics;

var data = new List<Tuple<string, double>>
{
    Tuple.Create("A", 1.0),
    Tuple.Create("B", 2.0),
    Tuple.Create("C", 1.5)
};

// using the normal extension methods within `Statistics`
var stdDev1 = data.Select(x => x.Item2).StandardDeviation();
var mean1 = data.Select(x => x.Item2).Mean();

// single pass variant (unfortunately there's no single pass MeanStdDev yet):
var meanVar2 = data.Select(x => x.Item2).MeanVariance();
var mean2 = meanVar2.Item1;
var stdDev2 = Math.Sqrt(meanVar2.Item2);

// directly using the `StreamingStatistics` class:
StreamingStatistics.MeanVariance(data.Select(x => x.Item2));

OTHER TIPS

The eaisiest solution you can use is to put Linq so that transform List to array

  List<SomeClass> list = ...

  GetMeanAndStdError(list.ToArray()); // <- Not that good performance

However, if perforamance is your concern, you'd rather compute Mean and Variance explicitly (write your own function):

  List<SomeClass> list = ...

  Double sumX = 0.0;
  Double sumXX = 0.0;

  foreach (var item in list) {
    Double x = item.SomeProperty;

    sumX += x;
    sumXX += x * x;
  }

  Double mean = sumX / list.Count;
  Double variance = (sumXX / list.Count - mean);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top