Domanda

What is the most efficient way to remove multiple items from an IList<T> object. Suppose I have an IEnumerable<T> of all the items I want to remove, in the same order of occurrence that in the original list.

The only way I have in mind is:

IList<T> items;
IEnumerable<T> itemsToDelete;
...

foreach (var x in itemsToDelete)
{
    items.Remove(x);
}

But I guess it's not efficient, because it has to go over the list from the beggining every time the method Remove is called.

È stato utile?

Soluzione

As the number of items to remove gets larger, you will probably find traversing the list and checking each item against a hashset of "items to remove" is more efficient. An extension method like this might help:

static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
{
    var set = new HashSet<T>(itemsToRemove);

    var list = iList as List<T>;
    if (list == null)
    {
        int i = 0;
        while (i < iList.Count)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i++;
        }
    }
    else
    {
        list.RemoveAll(set.Contains);
    }
}

I benchmarked using this little program below. (Note that it uses an optimized path if IList<T> is actually a List<T>.)

On my machine (and using my test data), this extention method took 1.5 seconds to execute vs 17 seconds for the code in your question. However, I have not tested with different sizes of data. I'm sure for removing just a couple of items RemoveAll2 will be faster.

static class Program
{
    static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
    {
        var set = new HashSet<T>(itemsToRemove);

        var list = iList as List<T>;
        if (list == null)
        {
            int i = 0;
            while (i < iList.Count)
            {
                if (set.Contains(iList[i])) iList.RemoveAt(i);
                else i++;
            }
        }
        else
        {
            list.RemoveAll(set.Contains);
        }
    }

    static void RemoveAll2<T>(this IList<T> list, IEnumerable<T> itemsToRemove)
    {
        foreach (var item in itemsToRemove)
            list.Remove(item);
    }

    static void Main(string[] args)
    {
        var list = Enumerable.Range(0, 10000).ToList();
        var toRemove = new[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
                              43,  47,  53,  59,  61,  67,  71,  73,  79,  83,  89,  97, 101,
                             103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
                             173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239,
                             241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
                             317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397,
                             401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467,
                             479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569,
                             571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643,
                             647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
                             739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823,
                             827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
                             919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997};
        list.RemoveAll(toRemove); // JIT 
        //list.RemoveAll2(toRemove); // JIT 

        var sw = Stopwatch.StartNew();
        for (int i = 0; i < 10000; i++)
        {
            list.RemoveAll(toRemove);
            //list.RemoveAll2(toRemove);
        }
        sw.Stop();
        Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
        Console.ReadKey();
    }
}

UPDATE (for @KarmaEDV's & Mark Sowul's comments below): If you need to use a custom equality comparer, the extension method could have an overload that takes such a comparer:

public static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove, IEqualityComparer<T> comparer = null)
{
    var set = new HashSet<T>(itemsToRemove, comparer ?? EqualityComparer<T>.Default);

    if (iList is List<T> list)
    {
        list.RemoveAll(set.Contains);
    }
    else
    {
        int i = iList.Count - 1;
        while (i > -1)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i--;
        }
    }
}

Altri suggerimenti

If the IList<T> reference happens to refer to an instance of List<T>, casting to that type and using RemoveAll is apt to yield better performance than any other approach that doesn't rely upon the particulars of its implementation.

Otherwise, while the optimal approach will depend upon the relative fraction of items that are going to be removed and the nature of the IList<T>, I would suggest that your best bet might be to copy the IList<T> to a new List<T>, clear it, and selectively re-add items. Even if the items in the list are not conducive to efficient hashing, the fact that the items in the IEnumerable<T> are in the same sequence as those in the IList<T> would render that irrelevant. Start by reading an item from the IEnumerable<T>. Then copy items from the array to the list until that one is found. Then read the next item from the IEnumerable<T> and copy from the array to the list until that one is found, etc. Once the IEnumerable<T> is exhausted, copy the balance of the array to the List<T>.

This approach will be fast with many implementations of IList<T>. It has one major disadvantage, though: the fact that it deletes and re-adds each item might have unwanted side-effects on things like observable lists. If a list might be observable, one may have to use a much slower N^2 algorithm to ensure correctness. [BTW, it irks me that IList<T> has a Remove(T) method but lacks a much more useful RemoveAll(Func<T,bool>) method. The Remove(T) is largely redundant with IndexOf and RemoveAt, while RemoveAll would allow O(N) implementations of many operations that are O(N^2) in its absence if one isn't allowed to remove and re-add items.

Maybe this helps. Other ideas of the same type could be included.

IList<T> items;

IEnumerable<T> itemsToDelete;
...
{
   if(items.Equals(itemsToDelete)) //Equal lists?
     {
      items.Clear(); 
      return true;
     }


   if(  (double) items.Count/itemsToDelete.Count < 1){
      /* It is faster to iterate the small list first. */ 
              foreach (var x in items)
              {
                if(itemsToDelete.Contains(x)){/**/} 

              }
    }
   else{
           foreach (var x in itemsToDelete)
              {
               items.Remove(x);
              }
   }
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top