Question

I need some kind of priority queue to store pairs <key, value>. Values are unique, but keys aren't. I will be performing the following operations (most common first):

  1. random insertion;
  2. retrieving (and removing) all elements with the least key.
  3. random removal (by value);

I can't use std::priority_queue because it only supports removing the head.

For now, I'm using an unsorted std::list. Insertion is performed by just pushing new elements to the back (O(1)). Operation 2 sorts the list with list::sort (O(N*logN)), before performing the actual retrieval. Removal, however, is O(n), which is a bit expensive.

Any idea of a better data structure?

Was it helpful?

Solution 6

Ok, so I've tested many options and ended up with something based on the idea of Matthieu M.. I'm currently using a std::map<key_type, std::list<value_type> >, where the value_type contains a std::list<value_type>::iterator to itself, which is useful for removal.

Removal must check if the vector is empty, which implies a map query and possibly a call to erase. Worst-case complexity is when keys are distinct, O(logN) for insertion, O(1) for retrieval and O(logN) for removal. I've got very good experimental results comparing to other alternatives on my test machine.

Using a std::vector is less efficient both in terms of theoretical complexity (O(N) worst-case for removal when keys are identical) and experimentation I've been doing.

OTHER TIPS

When you need order, use an ordered container. There is no point in paying the cost of sorting later on.

Your current solution is:

  • Insertion O(1)
  • Retrieval O(N log N)
  • Removal O(N) (which is as good as you can get without keeping another index there)

Simply using a std::multi_map you could have:

  • Insertion O(log N)
  • Retrieval O(log N) <-- much better isn't it ? We need to find the end of the range
  • Removal O(N)

Now, you could do slightly better with a std::map< key, std::vector<value> >:

  • Insertion O(log M) where M is the number of distinct keys
  • Retrieval O(1) (begin is guaranteed to be amortized constant time)
  • Removal O(N)

You can't really push the random removal... unless you're willing to keep another index there. For example:

typedef std::vector<value_type> data_value_t;
typedef std::map<key_type, data_value_t> data_t;

typedef std::pair<data_t::iterator,size_t> index_value_t;
  // where iterator gives you the right vector and size_t is an index in it

typedef std::unordered_map<value_type, index_value_t> index_t;

But keeping this second index up to date is error prone... and will be done at the expense of the other operations! For example, with this structure, you would have:

  • Insertion O(log M) --> complexity of insertion in hash map is O(1)
  • Retrieval O(N/M) --> need to de index all the values in the vector, there are N/M in average
  • Removal O(N/M) --> finding in hash map O(1), dereferencing O(1), removing from the vector O(N/M) because we need to shift approximately half the content of the vector. Using a list would yield O(1)... but might not be faster (depends on the number of elements because of the memory tradeoff).

Also bear in mind that hash map complexity are amortized ones. Trigger a reallocation because you overgrew the load factor, and this particular insertion will take a very long time.

I'd really go with the std::map<key_type, std::vector<value_type> > in your stead. That's the best bang for the buck.

Can you reverse the order of the collection, i.e. store them in <value, key> order?

Then you could just use std::map having O(logn) time for insertion O(n) for removal (traversing whole collection) and O(logn) for random removal of value (which would be the key of said map).

If you could find a map implementation based on hashes instead of trees (like std::map) the times would be even better: O(1), O(n), O(1).

If you're using Visual Studio they have hash_multimap. I should also add that Boost has an unordered multimap, here. If you need an ordered multimap, STL multimap or ordered multiset STL multiset

std::multimap seem to be what you are searching for.

It will store your objects ordered by key, allow you to retrieve the lowest/highest key value (begin(), rbegin()) and all the object with a given key (equal_range, lower_bound, upper_bound).

(EDIT: if you have just a few items, say less than 30, you should also test the performance of just using a deque or a vector)

If I understood well, you performance target is to have fast (1) and (3), and (2) is not that important. In this case, and given that values are unique, why not just have a std::set<value>, and do a sequential search for (2)? You'd have O(log n) for (1) and (3), and O(n) for (2). Better yet, if your STL has std::hash_set, you'd have close to O(1) for (1) and (3).

If you need something better than O(n) for (2), one alternative would be to have a set of priority queues.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top