Question

In a project I am working on, there are really huge collections (1M-1B elements), and things are modified as collections mostly.

It's a real-time app, and so the performance is paramount.

For some of the operations, like Reverse, BinarySearch (possible?), etc will suffer more than others like Select, etc.

Is it feasible to implement one's own IEnumerable with possible MoveNext, MovePrev, etc and own implemented LINQ extensions that take advantages of these?

If this is gonna happen, it's gonna happen at the end of the project. Because we need to get it working first, then make it faster.

All in all this shouldn't be too much work, right?

Was it helpful?

Solution

It's very definitely possible to create your own implementation of Enumerable which might special-case some situations. You'd basically want to detect your own collection types (or possibly just collections such as List<T>) and use a more efficient implementation where applicable.

I have a sample project which I used to demo "implementing LINQ to Objects in an hour" which you might like to look at for examples. It's not a full implementation and in particular it's less efficient than the real LINQ to Objects - but you may still find it interesting.

Alternatively, you may find that i4o (Indexed LINQ) does everything you need out of the box - or that you would be better off contributing to that than starting from scratch. Worth checking out.

Just remember that at the end of the day, LINQ is basically a nice design coupled with syntactic sugar. The C# compiler doesn't know anything special about System.Linq.Enumerable, for example.

OTHER TIPS

If you really want performance, you can do quite a lot. Remember that the following selection:

var result = from element in collection
             where element.Id == id
             select element;

Compiles as:

var result = collection.Where(element => element.Id == id);

If you create the following method for the type of collection, then you can exploit the fact that the primary action is equality of the Id member and handle the request in an optimized manner. The important thing is correctly identifying the performance-critical operations on your collection and choosing the correct algorithms (i.e. complexity) to perform them.

public IEnumerable<TElement> Where(Expression<Func<TElement, bool>> selector)
{
    // detect equality of the Id member and return some special value
}

Consider System.Linq.Enumerable.Reverse() - this method fully enumerates the IEnumerable before returning the first result.

If your query is myCollection.Reverse().Take(10), and your collection has Billions of items, it is a horrible idea to enumerate the billions of items to get 10 out.

If you supplied a Reverse method on your own type, you could supply a better implementation that simply loops backwards over the collection (by index possibly).

The key to this is supplying your own type where you control the implementations. You can't use the implementations that work for all IEnumerable<T> because those implementations won't take full advantage of the capabilities of your custom collection type.

Is it feasible to implement one's own IEnumerable with possible MoveNext, MovePrev, etc and own implemented LINQ extensions that take advantages of these?

IEnumerable (or more properly, IEnumerator) doesn't have MovePrev. You could define an interface:

public interface IReversable<T> : IEnumerable<T>
{
    IEnumerator<T> GetReverseEnumerator();
}

This could be implemented by any container that supports efficient reverse enumeration.

You could then write an overload of Reverse (the extension method) to work off this new interface, and collection classes that implement the interface, etc. And then you'd have to use those collection classes instead of standard ones like List<T>.

But (I don't have Reflector handy to check) it may be that the built-in Reverse is smart enough to do things the quick way if it can get the IList interface from the collection, which would optimise the most common cases just fine anyway.

So there may not be a lot of point in this kind of approach.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top