Hashing floating point can be tricky. In particular, your hash
routine calculates the hash as a floating point value, then
converts it to an unsigned integral type. This has serious
problems if the vertices can be small: if all of the vertices
are in the range [0...1.0)
, for example, your hash function
will never return anything greater than 13. As an unsigned
integer, which means that there will be at most 13 different
hash codes.
The usual way to hash floating point is to hash the binary
image, checking for the special cases first. (0.0
and -0.0
have different binary images, but must hash the same. And it's
an open question what you do with NaN
s.) For float
this is
particularly simple, since it usually has the same size as
int
, and you can reinterpret_cast
:
size_t
hash( float f )
{
assert( /* not a NaN */ );
return f == 0.0 ? 0.0 : reinterpret_cast( unsigned& )( f );
}
I know, formally, this is undefined behavior. But if float and
int have the same size, and unsigned has no trapping
representations (the case on most general purpose machines
today), then a compiler which gets this wrong is being
intentionally obtuse.
You then use any combining algorithm to merge the three results;
the one you use is as good as any other (in this case—it's
not a good generic algorithm).
I might add that while some of the comments insist on profiling
(and this is generally good advice), if you're taking 15 minutes
for 3 million values, the problem can really only be a poor hash
function, which results in lots of collisions. Nothing else will
cause that bad of performance. And unless you're familiar with
the internal implementation of std::unordered_set
, the usual
profiler output will probably not give you much information.
On the other hand, std::unordered_set
does have functions
like bucket_count
and bucket_size
, which allow analysing
the quality of the hash function. In your case, if you cannot
create an unordered_set
with 3 million entries, your first
step should be to create a much smaller one, and use these
functions to evaluate the quality of your hash code.