Question

I need to calculate averages, standard deviations, medians etc for a bunch of numerical data. Is there a good open source .NET library I can use? I have found NMath but it is not free and may be overkill for my needs.

Was it helpful?

Solution

I found this on the CodeProject website. It looks like a good C# class for handling most of the basic statistical functions.

OTHER TIPS

You have to be careful. There are several ways to compute standard deviation that would give the same answer if floating point arithmetic were perfect. They're all accurate for some data sets, but some are far better than others under some circumstances.

The method I've seen proposed here is the one that is most likely to give bad answers. I used it myself until it crashed on me.

See Comparing three methods of computing standard deviation.

Have a look at MathNet it is not specifically for statistics, but there might be useful functionality for what you want

Apache Maths.Common and run it through IKVM.

I decided it was quicker to write my own, that just did what I needed. Here's the code...

/// <summary>
/// Very basic statistical analysis routines
/// </summary>
public class Statistics
{
    List<double> numbers;
    public double Sum { get; private set; }
    public double Min { get; private set; }
    public double Max { get; private set; }
    double sumOfSquares;

    public Statistics()
    {
        numbers = new List<double>();
    }

    public int Count
    {
        get { return numbers.Count; }
    }

    public void Add(double number)
    {
        if(Count == 0)
        {
            Min = Max = number;
        }
        numbers.Add(number);
        Sum += number;
        sumOfSquares += number * number;
        Min = Math.Min(Min,number);
        Max = Math.Max(Max,number);            
    }

    public double Average
    {
        get { return Sum / Count; }
    }

    public double StandardDeviation
    {
        get { return Math.Sqrt(sumOfSquares / Count - (Average * Average)); }
    }

    /// <summary>
    /// A simplistic implementation of Median
    /// Returns the middle number if there is an odd number of elements (correct)
    /// Returns the number after the midpoint if there is an even number of elements
    /// Sorts the list on every call, so should be optimised for performance if planning
    /// to call lots of times
    /// </summary>
    public double Median
    {
        get
        {
            if (numbers.Count == 0)
                throw new InvalidOperationException("Can't calculate the median with no data");
            numbers.Sort();
            int middleIndex = (Count) / 2;
            return numbers[middleIndex];
        }
    }
}

AForge.NET has AForge.Math namespace, providing some basic statistics functions: Histogram, mean, median, stddev, entropy.

If you just need to do some one-off number crunching, a spreadsheet is far and away your best tool. It's trivial to spit out a simple CSV file from C#, which you can then load up in Excel (or whatever):

class Program
{
    static void Main(string[] args)
    {
        using (StreamWriter sw = new StreamWriter("output.csv", false, Encoding.ASCII))
        {
            WriteCsvLine(sw, new List<string>() { "Name", "Length", "LastWrite" });

            DirectoryInfo di = new DirectoryInfo(".");
            foreach (FileInfo fi in di.GetFiles("*.mp3", SearchOption.AllDirectories))
            {
                List<string> columns = new List<string>();
                columns.Add(fi.Name.Replace(",", "<comma>"));
                columns.Add(fi.Length.ToString());
                columns.Add(fi.LastWriteTime.Ticks.ToString());

                WriteCsvLine(sw, columns);
            }
        }
    }

    static void WriteCsvLine(StreamWriter sw, List<string> columns)
    {
        sw.WriteLine(string.Join(",", columns.ToArray()));
    }
}

Then you can just 'start excel output.csv' and use functions like "=MEDIAN(B:B)", "=AVERAGE(B:B)", "=STDEV(B:B)". You get charts, histograms (if you install the analysis pack), etc.

The above doesn't handle everything; generalized CSV files are more complex than you might think. But it's "good enough" for much of the analysis I do.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top