Question

I know there's a "not" on ienumerable thanks to linq which takes a collection to not against, but I'm worried about big oh performance What's the fastest algorithm to do this?

Was it helpful?

Solution

The only way to remove a subset of items from an IEnumerable<T> is to loop through the superset and for each item in the superset loop through the subset, removing that item from the superset if it is found in the subset.

This will give you O(n²) on average.

Now if there is additional information about these collections (perhaps one or both is a list or maybe one or both of the sequences are sorted) that could help you create a more performant solution.

If you are interested, here is an extension method that will do what I just described:

public static IEnumerable<T> Exclude<T>
    (this IEnumerable<T> source, IEnumerable<T> items)
{
    foreach (T t in source)
        if (!items.Contains(t))
            yield return t;
}


Nevermind, use the Enumerable.Except extension method:

Produces the set difference of two sequences.

OTHER TIPS

If you can iterate over the sets in order, you can guarantee O(n) behaviour (rather than the "typically O(n) but possibly O(n²) in the worst case" that a hashset has) by iterating through them both in lockstep.

For example:

//loop boilerplate
if(itemA < itemB) {
    itemA = a.next();
    continue;
}
if(itemA > itemB) {
    itemB = b.next();
    continue;
}
a.remove(itemA);

You'll need to add boundary checking and other boilerplate yourself.

You may get better performance by converting the superset into a hashtable (O(n) typically, but then allows you to perform lookups in constant time). Then you can enumerate over the subset and check whether each item exists in the superset. The entire operation should take O(n) extra memory and O(n) time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top