質問

Whenever I want to insert into a SortedList, I check to see if the item exists, then I insert. Is this performing the same search twice? Once to see if the item is there and again to find where to insert the item? Is there a way to optimize this to speed it up or is this just the way to do it, no changes necessary?

if( sortedList.ContainsKey( foo ) == false ){
    sortedList.Add( foo, 0 );
}
役に立ちましたか?

解決

You can add the items to a HashSet and the List, searching in the hash set is the fastest way to see if you have to add the value to the list.

if( hashSet.Contains( foo ) == false ){
    sortedList.Add( foo, 0 );  
    hashSet.Add(foo);
}

他のヒント

You can use the indexer. The indexer does this in an optimal way internally by first looking for the index corresponding to the key using a binary search and then using this index to replace an existing item. Otherwise a new item is added by taking in account the index already calculated.

list["foo"] = value;

No exception is thrown whether the key already exists or not.


UPDATE:

If the new value is the same as the old value, replacing the old value will have the same effect than doing nothing.

Keep in mind that a binary search is done. This means that it takes about 10 steps to find an item among 1000 items! log2(1000) ~= 10. Therefore doing an extra search will not have a significant impact on speed. Searching among 1,000,000 items will only double this value (~ 20 steps).

But setting the value through the indexer will do only one search in any case. I looked at the code using Reflector and can confirm this.

I'm sorry if this doesn't answer your question, but I have to say sometimes the default collection structures in .NET are unjustifiably limited in features. This could have been handled if Add method returned a boolean indicating success/failure very much like HashSet<T>.Add does. So everything goes in one step. In fact the whole of ICollection<T>.Add should have been a boolean so that implementation-wise it's forced, very much like Collection<T> in Java does.

You could either use a SortedDictionary<K, V> structure as pointed out by Servy or a combination of HashSet<K> and SortedList<K, V> as in peer's answer for better performance, but neither of them are really sticking to do it only once philosophy. I tried a couple of open source projects to see if there is a better implementation in this respect, but couldn't find.

Your options:

  1. In vast majority of the cases it's ok to do two lookups, doesn't hurt much. Stick to one. There is no solution built in.

  2. Write your own SortedList<K, V> class. It's not difficult at all.

  3. If you'r desperate, you can use reflection. The Insert method is a private member in SortedList class. An example that does.. Kindly dont do it. It's a very very poor choice. Mentioned here for completeness.

ContainsKey does a binary search, which is O(log n), so unless you list is massive, I wouldn't worry about it too much. And, presumably, on insertion it does another binary search to find the location to insert at.

One option to avoid this (doing the search twice) is to use a the BinarySearch method of List. This will return a negative value if the item isn't found and that negative value is the bitwise compliment of the place where the item should be inserted. So you can look for an item, and if it's not already in the list, you know exactly where it should be inserted to keep the list sorted.

SortedList<Key,Value> is a slow data structure that you probably shouldn't use at all. You may have already considered using SortedDictionary<Key,Value> but found it inconvenient because the items don't have indexes (you can't write sortedDictionary[0]) and because you can write a find nearest key operation for SortedList but not SortedDictionary.

But if you're willing to switch to a third-party library, you can get better performance by changing to a different data structure.

The Loyc Core libraries include a data type that works the same way as SortedList<Key,Value> but is dramatically faster when the list is large. It's called BDictionary<Key,Value>.

Now, answering your original question: yes, the way you wrote the code, it performs two searches and one insert (the insert is the slowest part). If you switch to BDictionary, there is a method bdictionary.AddIfNotPresent(key, value) which combines those two operations into a single operation. It returns true if the specified item was added, or false if it was already present.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top