Pergunta

I understand the difference between ordered and unordered sets, and I understand why for many purposes, we don't need ordered sets. But all set operations are still possible on ordered sets, and sets have to be stored internally with some order anyway, so why aren't sets ordered by default? Is the performance impact of preserving the order of sets too large?

Foi útil?

Solução

The point is not that the overhead is particularly large, more that it is there at all.

Language features must always strike a balance of cost-effectiveness. Dictionaries are absolutely fundamental to Python programming, so it would be very bad for them to be even slightly slower than they have to be just to preserve insertion order, when most of the time you don't need ordering. It was the correct decision to discard the insertion order in return for slightly faster access, and leave order-preserving data structure for special classes. If there was another data structure that could do all that a dict can, and dict was a lesser-used wrinkle of the language, things might look different.

Outras dicas

You are correct that item are store internally with some order, but this internal order is determined by the hash code of the key, which is what allows retrieval to be so fast. So if a set/dict should be ordered, it would need to maintain a separate internal data structure (say an ordered list of keys) for this.

This would of course increase the size. But perhaps worse, it will affect performance. For example removing an item from a set is an O(1) operation, but if it also have to remove the key from an internal ordered list it would become O(n). Such a cost would be disastrous for some applications. Given that it is pretty rare you need an ordered set, such a trade-off is not be worth it for the standard set/dict types.

Your premise is incorrect. As of Python 3.6, dicts remember their insertion order. This was an implementation detail, and was promoted to full language feature in 3.7. In 3.6, for the specific case of **kwargs, order preservation is specifically guaranteed.

An ordered set is only possible when the elements to be stored have an ordering (i.e. a comparison method) in the first place - but that is not always a given.

The default set/map implementation in most environments nowadays is based on an autoresizing hashtable, which has these advantages:

  • faster
  • uses less memory
  • doesn't require the elements to provide an ordering

sets have to be stored internally with some order anyway

But this internal order doesn't necessarily have any meaning, nor does it stay the same. Indeed, one property of hashtables that sometimes confounds inexperienced developers is that the iteration order, which is based on the internal ordering, can change completely when elements are added (i.e. when a resize is triggered) or between different runs.

The general idea behind a set or a dictionary is that you plan to be performing a lot of lookup operations. It is optimized for said lookup operations by using a hash which allows O(1) lookup in most cases.

Order is done using arrays or linked lists and in fact performing operations where order is important, they are optimized for that such as appending a value at the end or beginning.

By the nature of these two data structures, neither is optimized for both. This isn't to say it isn't possible, but it involves both data structures if you want both lookup and order-based operations to be optimized.

So you have this tradeoff between:

lookup operation optimization <=> order-based operations <=> memory usage

The general consensus is that as a programmer, you generally want to optimize for one or the other but not both, and certainly nobody advocates doubling your memory usage when you only need to optimize one of the two.

That said, there are implementations with both, or at least in Java, specifically LinkedHashMap is both an array and a hash-based dictionary. Sometimes you may need both, but it is adviced to use ArrayList if you only need a list and a HashMap if you only need a dictionary.

Licenciado em: CC-BY-SA com atribuição
scroll top