Unordered (hash) map from bitset to bitset on boost

https://stackoverflow.com/questions/3896357

29-09-2019
|

Pergunta

I want to use a cache, implemented by boost's unordered_map, from a dynamic_bitset to a dynamic_bitset. The problem, of course, is that there is no default hash function from the bitset. It doesn't seem to be like a conceptual problem, but I don't know how to work out the technicalities. How should I do that?

Solução

I found an unexpected solution. It turns out boost has an option to #define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS. When this is defined, private members including m_bits become public (I think it's there to deal with old compilers or something).

So now I can use @KennyTM's answer, changed a bit:

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {            
        return boost::hash_value(bs.m_bits);
    }
}

Outras dicas

There's to_block_range function that copies out the words that the bitset consists of into some buffer. To avoid actual copying, you could define your own "output iterator" that just processes individual words and computes hash from them. Re. how to compute hash: see e.g. the FNV hash function.

Unfortunately, the design of dynamic_bitset is IMHO, braindead because it does not give you direct access to the underlying buffer (not even as const).

It is a feature request.

One could implement a not-so-efficient unique hash by converting the bitset to a vector temporary:

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
        std::vector<B, A> v;
        boost::to_block_range(bs, std::back_inserter(v));
        return boost::hash_value(v);
    }
}

We can't directly calculate the hash because the underlying data in dynamic_bitset is private (m_bits)

But we can easily finesse past (subvert!) the c++ access specification system without either

hacking at the code or
pretending your compiler is non-conforming (BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS)

The key is the template function to_block_range which is a friend to dynamic_bitset. Specialisations of this function, therefore, also have access to its private data (i.e. m_bits).

The resulting code couldn't be simpler

namespace boost {


// specialise dynamic bitset for size_t& to return the hash of the underlying data
template <>
inline void
to_block_range(const dynamic_bitset<>& b, size_t& hash_result)
{
    hash_result = boost::hash_value(bs.m_bits);
}

std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) 
{            
    size_t hash_result;
    to_block_range(bs, hash_result);
    return hash_result;
}
}

the proposed solution generates the same hash in the following situation.

#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {            
        return boost::hash_value(bs.m_bits);
    }
}

boost::dynamic_biset<> test(1,false);

auto hash1 = boost::hash_value(test);

test.push_back(false);

auto hash2 = boost::hash_value(test);

// keep continue...

test.push_back(false);

auto hash31 = boost::hash_value(test);

// magically all hash1 to hash31 are the same!

the proposed solution is sometimes improper for hash map.

I read the source code of dynamic_bitset why this happened and realized that dynamic_bitset stores one bit per value as same as vector<bool>. For example, you call dynamic_bitset<> test(1, false), then dynamic_bitset initially allocates 4 bytes with all zero and it holds the size of bits (in this case, size is 1). Note that if the size of bits becomes greater than 32, then it allocates 4 bytes again and push it back into dynamic_bitsets<>::m_bits (so m_bits is a vector of 4 byte-blocks).

If I call test.push_back(x), it sets the second bit to x and increases the size of bits to 2. If x is false, then m_bits[0] does not change at all! In order to correctly compute hash, we need to take m_num_bits in hash computation.

Then, the question is how?

1: Use boost::hash_combine This approach is simple and straight forward. I did not check this compile or not.

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) { 
        size_t tmp = 0;
        boost::hash_combine(tmp,bs.m_num_bits);           
        boost::hash_combine(tmp,bs.m_bits);
        return tmp;
    }
}

2: flip m_num_bits % bits_per_block th bit. flip a bit based on bit size. I believe this approach is faster than 1.

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
        // you may need more sophisticated bit shift approach.
        auto bit = 1u << (bs.m_num_bits % bs.bits_per_block);
        auto return_val = boost::hash_value(bs.m_bits);

       // sorry this was wrong
       //return (return_val & bit) ? return_val | bit : return_val & (~bit);
       return (return_val & bit) ? return_val & (~bit) : return_val | bit;
    }
}

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow