Question

I need a data structure that holds a list of elements of the same type. Needed features are

  • Add
  • GetEnumerator
  • (may be) Clear

An indexed access, sorting, searching, removing of elements are not required. What is the best collection class for it? The following aspects should be taken in consideration: performance, memory usage, behavior of the garbage collector.

My current candidates are List<T> and LinkedList<T>

Was it helpful?

Solution

Short answer
default to using List<T> in almost all cases.

Slightly longer answer
LinkedList<T> is only going to be better if you are doing a lot of adding and removing values in comparison to enumerating over those values and the size of the list is large. This should only be a factor in your choice if, subsequent to profiling you find using List<T> is a problem.

Longer answer

Assumption, you have identified the usage of one or the other as being a problem for performance.

It you do a lot of random access then List<T> will almost always be vastly faster no matter what. If you enumerate a lot and rarely insert (or almost always insert near/at the end) the List<T> will almost always be faster. If you are constantly inserting/removing in random locations but while iterating over the list so are already at or near the relevant node and will have at least several thousand elements you may want to try LinkedList<T>

Working out what values/usage translate to better performance are very much dependent on your usage profile. Microbenchmarks can be very misleading here since they brush under the carpet aspects of the linked lists behaviour like the nodes in in being distributed about in memory rather than nicely adjacent if they happen to get allocated all at once in your tests. Likewise pre-creating the List<T> with the right size can make a big difference.

As to computer science style reasoning and big O notation (which really need big N's to be meaningful in this case)

  • operation
    • cost List<T>
    • cost LinkedList<T>
  • insert at end
    • O(1) (amortized cost, allocation to double size as needed)
    • O(1) allocation each time
  • insert at start
    • O(N) (though done as fast memory move so somewhat complex running time behaviour)
    • O(1)
  • insert in location x (and remove as well)
    • O(N-x) (see comment for insert at end)
    • O(1)
  • forward enumeration
    • O(N) (though cache misses minimized)
    • O(N) (though heavily dependent on cache locality)
  • reverse enumeration
    • O(N)
    • O(N) (the LinkedList<T> implementation is doubly linked)
  • random access
    • O(1)
    • O(N)

Memory usage is complex since List can have at most Count - 1 excess cells at any point but LinkedList<T> will consume a LinkedListNode<T> for each cell which is an additional 3 references (4/8 bytes a pop) plus the usual object overhead. In normal usage the List is likely to win but again this should only be something you worry about if you find the memory consumption is actually a problem.

OTHER TIPS

Unless you are dealing with a massive structure or you plan on iterating over this thing a trillion times, it doesn't matter. Just pick one and start coding. If your app slows to a crawl later, figure out why then and change as necessary.

(Seriously, it does. not. matter. In the slightest. Every minute you spend on finding the answer to this question is a minute closer you could've been to having working code).

If anyone has gotten to the point where they need to know the difference, LinkedList is faster than List and could be used if you only need non-random, forward-only reading and appending functionality.

Unless you're dealing with hundreds of thousands or millions of entries, and you've profiled your program to see that there's a major problem, you probably won't notice the difference between the two.

Other than that:

LinkedList<T> provides separate nodes of type LinkedListNode<T>, so insertion and removal are O(1) operations.

from here.

I would use List<T> because all of your data will be stored sequentially if they're value types and still do well with reference types (internally, List<T> manages an array that it grows by a factor of two each time you run out of space).

LinkedList<T> used to make a lot more sense when things weren't IO bound. People will often quote its seeming "O(1)" nature. However, this discounts the true cost that comes with increased chances of page faults getting the nodes.

If you can get contiguous memory regions with an array or List<T> and avoid the potential of a page fault, you're much better off with modern processor and main memory caching lines.

If you know how many elements in advance you'll have, use an array. If you have a good idea of how many elements, use a List<T> (and in the constructor pass the likely upper bound to potentially avoid a reallocation).

The only time I'd use a LinkedList<T> is if you need to bump items in a list by one value constantly. For example, if you're implementing a least recently used cache algorithm and need to add something to the front and take something off the end.

For small items, it really won't make a difference. The generational garbage collector will compact scattered heap items together over time so the linked list won't get too bad.

I'd pick a List<T> and run with it unless you noticed problems (via profiling)

If indexed access is not required, you're probably better off with a LinkedList<T>. There's not much difference, but the LinkedList can be faster for the appends.

Given your requirements for the interface, you should be using ICollection<T> everywhere in your code instead of referring to a particular concrete class.

The reason is that if you use List then you may write a bunch of code that uses l[0] to get the first item. On the other hand, if you use LinkedList then you might spread calls to AddLast throughout your code. To preserve the choice to switch, you need to avoid becoming accidentally dependent on features you don't really need. You need to use an interface that both containers can support.

So write a class like this:

public static class Collections
{
    public static ICollection<T> Make<T>()
    {
        return new List<T>();
    }
}

Then whenever you need a collection, do this:

ICollection<int> c = Collections.Make<int>();

Isolate your choice of container from the rest of your program. Now when you profile and change your mind, you just have to edit the Make<T> method.

A LinkedList will carry more memory overhead (for the nodes), but is better if you are inserting many elements into the middle. A linked list will require more memory allocations too, as LinkedListNode is a class and is allocated dynamically. If you are not inserting elements (just appending), I would use a List. A list should be as fast or faster for appending, although it will occasionally have to enlarge.

If your array is going to be large enough to matter, you need to roll your own ... otherwise just use a List.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top