how to adapt an unordered STL container to store only Values of a Key-Value pair?

https://stackoverflow.com/questions/8795735

15-04-2021
|

题

The new C++11 standard has unordered containers. In particular, the std::unordered_map<Key, Value> stores a std::pair<Key, Value> in a location based on std::hash<Key> (default hash function). Similarly, the std::unordered_set<Key> stores a Key in a location based on std::hash<Key>.

My question is: how can one store only the Value of a Key-Value pair, in a location based on std::hash<Key>? This would be useful if one uses a perfect hash function, i.e. one for which different Keys map to different hash indices (so there is never collision resolution required).

An unordered_set only uses the key, and an unordered_map uses both the key and the value, so the unordered STL containers in the new C++11 standard do not seem to allow such customization. What would be a good way to get such a data structure from the existing STL containers?

More generally, how can one store a std::pair<T, Value> in a location based on std::hash<Key>, where T is a type representing a signature of the Key? E.g. if Key is a large data structure, I would like to compute a 64-bit hash key and split this into two 32 bit parts: the upper 32 bits together with the Value form a std::pair<uint32_t, Value>, and the lower 32 bits determine the location where this pair is stored.

An application where this would be useful is e.g. computer chess, where a position (several kilobytes in some programs) as the Key type is hashed into a 64-bit key, of which only the upper 32 bits and some search related information as the Value type are stored as a std::pair (usually only 16 bytes in total) in a location based on the lower 32 bits of the hash key.

解决方案

There is no general-purpose way to perform operations on a hash without continuous access to the hash values. For example, suppose the hash internally uses a tree. To add a new node to the hash, you need to compare its hash value to the hash value of existing nodes on the tree. How can you do that if you didn't store their values in the tree?

What you're asking for is probably not impossible, but none of the typical hashing algorithms can do it. And there doesn't seem to be any point anyway, you have to store something to make the collection traversable, and it's hard to see how something other than the hash could ever work as well as the hash, since that's what you're searching for.

If the hash is "too big", use a hash of the hash. (Of course, then you have to deal with hash collisions.)

其他提示

Since C++11 hashes are actually of type size_t you can do something along the lines of:

template <typename T>
struct with_hash
{
    size_t hash;
    T value;
};

template<> struct std::hash<with_hash>
{
    typedef size_t result_type;
    typedef with_hash argument_type;
    size_t operator()(const with_hash &x)
    {
         return x.hash;
    }
};

template <typename T>
using perfectly_hashed = std::unordered_set< with_hash<T> >;

With a few more sintactic sugar here and there...

Implement your hashing function for the type you want to use as a key, and then create a type to hold the hashed value and specialize std::hash on that type to just return the hash value. Now you can compute the hash, discard the data used to compute the hash, and stick the value and its hash in the map.

To retrieve a value you somehow reconstruct the key data, and then you can recompute the hash value, and then search the map for that hash.

I may have gotten this completely wrong, but why not just a std::unordered_map<uint32_t, std::pair<uint32_t, Value>> with some nice utility functions for insertion and extraction?

// demonstration with 32bit 'hash' and 16bit 'lo' and 'hi'
#include <unordered_map>
#include <string>
#include <stdint.h>
#include <iostream>

int main(){
    typedef std::unordered_map<uint16_t, std::pair<uint16_t, std::string>> map_type;
    map_type m;
    std::string key = "hello", value = "world";
    uint32_t hash = std::hash<std::string>()(key);
    uint16_t lo = hash & 0xFFFF, hi = hash >> 16; // make a nice function for this
    m.insert(std::make_pair(lo, std::make_pair(hi, value))); // and this
    auto it = m.find(lo); // and this
    std::cout << "hash: " << hash << '\n'
              << "lo: " << it->first << '\n'
              << "hi: " << it->second.first << '\n'
              << "lo | (hi << 16): " << (it->first | (uint32_t(it->second.first) << 16)) << '\n'
              << "value: " << it->second.second << '\n';
}

Live demo on Ideone.

Output:

hash: 1335831723
lo: 11435
hi: 20383
lo | (hi << 16): 1335831723
value: world

My question is: how can one store only the Value of a Key-Value pair, in a location based on std::hash? This would be useful if one uses a perfect hash function, i.e. one for which different Keys map to different hash indices (so there is never collision resolution required).

A perfect hash function is not sufficient. Not only do you have to guarantee that there are no hash collisions, you also have to ensure that there are no bucket collisions. Heck, you even have to ensure that the number of buckets never changes, since your data structure cannot discover the hash of a key.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow