What would the preferred data structure for when two keys map to the same bucket in an std::ordered_map<T>. I am not sure whether it would be better to use a std::vector<T> (which would requiring copying all of the elements when reaching the capacity but quick iteration) or a std::list<T> which would be easy to add new elements but slower to iterate over?

std::vector

  • Quick for iterating over elements
  • Bad if run out of capacity as need to copy all elements in to new memory

std::list

  • Quick for adding/deleting nodes
  • Bad for iterating over as elements not in continuous memory
有帮助吗?

解决方案

Even without knowing your exact usage pattern and workload, I'd say that std::vector is better. This is true most of the time, and I'll try and explain why below.

  1. Most of the time, you do a lot more lookups into a hash table than insertions. Lookups require iteration over a bucket, and insertions require adding elements and potentially resizing. Therefore, it makes more sense to optimize for the more common use case.

  2. Every insert internally needs to do a lookup, so you will have at least as many lookups as insert; commonly much more.

  3. Most of the time the average number of keys per bucket is low. This translates into working with a small vector vs. a small list. And resizing a small vector (which involves copying of elements) will be fast.

  4. That vector "resizing" generally doesn't happen very often, so you don't have to be irrationally afraid of it. (Although you should note that when the vector is small, the resizing does happen more often. This is true for all implementations I know, but it's also trivial to rectify/circumvent.)

  5. Iteration over a vector is a lot faster than iteration over a list. A lot.

  6. Even resizing and copying (or moving) a vector can be as fast (or almost as fast) as adding an element into a list.

  7. With proper "move" support in your data types, resizing a vector becomes even less of an overhead.

  8. You can preallocate some number of elements with a vector, and almost completely eliminate all resizing in buckets. For example, if you know that 99% of your buckets will contain 3 keys or less, you can reserve 3 or 4 elements for each of the vectors and forget about resizing (almost.)

  9. Vectors (specially when their element type is small) are much more space-efficient than linked lists. A std::list needs to keep two additional pointers per element, which can be a huge overhead (50%-200% for elements of 8-16 bytes, depending on whether you are 32 or 64 bits.)

  10. Because of their smaller size and contiguous-ness in memory, vectors are generally a much faster and much nicer data structure.

  11. Ultimately, you must do your own measurement and benchmarking within your own codebase and with your own workloads and usage patterns. No one can give you a definite answer without complete information of those. So, if your elements are very large, immovable objects, and you do mostly insertions/deletion and very few lookups, then go ahead and use linked lists. Otherwise, use vectors.

You can take a look at this benchmark, comparing vector, list and deque. It might further help you decide to use vector!

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top