Grouping relatively close values using LINQ or loop
-
01-07-2021 - |
Question
I guess this would be more maths than C#. I've got an array of float values, where most values belong to one of the few tightly packed ranges. Here's an example (Lower Limit=0,Upper Limit=612):
3.4,5.0,6.1,
144.0,144.14,145.0,147.0,
273.77,275.19,279.0,
399.4,399.91,401.45,
533.26,537.0,538.9
This is a single array of 16 values, just separated them to show those "groups". What I need to do is to somehow group them, either using Linq, or a manual loop or whatever, so that those close values fall in a single group.
A simple math operation like dividing by 10 (or 100) won't work, because 399 would fall in a different group than 401 (4th group in the above example). Another approach would be to create a histogram of some kind, but I'm looking for something simple here. Any help would be greatly appreciated.
La solution
Just an another idea of a clustering with using GroupBy
with a custom comparer
var numbers = new float[]
{
3.4f, 5.0f, 6.1f, 144.0f, 144.14f, 145.0f,
147.0f, 273.77f, 275.19f, 279.0f, 399.4f, 399.91f, 401.45f,
49, 50, 51,
533.26f, 537.0f, 538.9f
};
foreach (var group in numbers.GroupBy(i => i, new ClosenessComparer(4f)))
Console.WriteLine(string.Join(", ", group));
And the custom ClosenessComparer
:
public class ClosenessComparer : IEqualityComparer<float>
{
private readonly float delta;
public ClosenessComparer(float delta)
{
this.delta = delta;
}
public bool Equals(float x, float y)
{
return Math.Abs((x + y)/ 2f - y) < delta;
}
public int GetHashCode(float obj)
{
return 0;
}
}
And the output:
1: 3,4 5 6,1
2: 144 144,14 145 147
3: 273,77 275,19 279
4: 399,4 399,91 401,45
6: 49 50 51
5: 533,26 537 538,9
Autres conseils
Here's a method that groups elements if they're within a certain delta (4 by default) of the previous value:
IEnumerable<IEnumerable<double>> GetClusters(IEnumerable<double> data,
double delta = 4.0)
{
var cluster = new List<double>();
foreach (var item in data.OrderBy(x=>x))
{
if (cluster.Count > 0 && item > cluster[cluster.Count - 1] + delta)
{
yield return cluster;
cluster = new List<double>();
}
cluster.Add(item);
}
if (cluster.Count > 0)
yield return cluster;
}
You can get tweak the algorithm by changing what you use for cluster[cluster.Count - 1] + delta
. For example, you might use
cluster[0] + delta
- delta from first element in the clustercluster.Average() + delta
- delta from the mean of the cluster so farcluster[cluster.Count / 2] + delta
- delta from the median of the cluster so far