Question

I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.

I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.

// The structure of this class is unimportant.  Just using
// it as an illustration.
public class CX
{
    public int KEY;
    public DateTime DT;
}

static CX getItem(int i, SortedList<int, CX> list)
{
    var items =
    (from kv in list
     where kv.Key >= i
     select kv.Key);

    if (items.Any())
    {
        return list[items.Min()];
    }

    return null;
}

Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?

Was it helpful?

Solution

Writing a binary search on your own can be tough.

Fortunately, Microsoft already wrote a pretty robust one: Array.BinarySearch<T>. This is, in fact, the method that SortedList<TKey, TValue>.IndexOfKey uses internally. Only problem is, it takes a T[] argument, instead of any IList<T> (like SortedList<TKey, TValue>.Keys).

You know what, though? There's this great tool called Reflector that lets you look at .NET source code...

Check it out: a generic BinarySearch extension method on IList<T>, taken straight from the reflected code of Microsoft's Array.BinarySearch<T> implementation.

public static int BinarySearch<T>(this IList<T> list, int index, int length, T value, IComparer<T> comparer) {
    if (list == null)
        throw new ArgumentNullException("list");
    else if (index < 0 || length < 0)
        throw new ArgumentOutOfRangeException((index < 0) ? "index" : "length");
    else if (list.Count - index < length)
        throw new ArgumentException();

    int lower = index;
    int upper = (index + length) - 1;

    while (lower <= upper) {
        int adjustedIndex = lower + ((upper - lower) >> 1);
        int comparison = comparer.Compare(list[adjustedIndex], value);
        if (comparison == 0)
            return adjustedIndex;
        else if (comparison < 0)
            lower = adjustedIndex + 1;
        else
            upper = adjustedIndex - 1;
    }

    return ~lower;
}

public static int BinarySearch<T>(this IList<T> list, T value, IComparer<T> comparer) {
    return list.BinarySearch(0, list.Count, value, comparer);
}

public static int BinarySearch<T>(this IList<T> list, T value) where T : IComparable<T> {
    return list.BinarySearch(value, Comparer<T>.Default);
}

This will let you call list.Keys.BinarySearch and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):

int index = list.Keys.BinarySearch(i);
if (index < 0)
    index = ~index;
var item = index < list.Count ? list[list.Keys[index]] : null;
return item;

OTHER TIPS

First, your query is being evaluated twice (once for Any, and once for Min). Second, Min requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:

if (items.Any())
{
    return list[items.Min()];
}

To this:

var default = 
    (from kv in list
     where kv.Key >= i
     select (int?)kv.Key).FirstOrDefault();

if(default != null) return list[default.Value];

return null;

UPDATE

Because you're selecting a value type, FirstOrDefault doesn't return a nullable object. I have altered your query to cast the selected value to an int? instead, allowing the resulting value to be checked for null. I would advocate this over using ContainsKey, as that would return true if your list contained a value for 0. For example, say you have the following values

0 2 4 6 8

If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (default(int)), which is in the list but isn't a valid result.

Using LINQ on a SortedList will not give you the benefit of the sort.

For optimal performance, you should write your own binary search.

OK, just to give this a little more visibility - here's a more concise version of Adam Robinson's answer:

return list.FirstOrDefault(kv => kv.Key >= i).Value; 

The FirstOrDefault function has an overload that accepts a predicate, which selects the first element satisfying a condition - you can use that to directly get the element you want, or null if it doesn't exist.

Why not use the BinarySearch that's built into the List class?

var keys = list.Keys.ToList();
int index = keys.BinarySearch(i);
if (index < 0)
    index = ~index;
var item = index < keys.Count ? list[keys[index]] : null;
return item;

If the search target isn't in the list, BinarySearch returns the bit-wise complement of the next-higher item; we can use that to directly get you what you want by re-complementing the result if it's negative. If it becomes equal to the Count, your search key was bigger than anything in the list.

This should be much faster than doing a LINQ where, since it's already sorted... As comments have pointed out, the ToList call will force an evaluation of the whole list, so this is only beneficial if you do multiple searches without altering the underlying SortedList, and you keep the keys list around separately.

Using OrderedDictionary in PowerCollections you can get an enumerator that starts where they key you are looking for should be... if it's not there, you'll get the next closest node and can then navigate forwards/backwards from that in O(log N) time per nav call.

This has the advantage of you not having to write your own search or even manage your own searches on top of a SortedList.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top