Filling unordered_set is too slow

Question 1

What happens if you just remove verticesSet.insert(vert); from the loop?

If it speeds-up dramatically (as I expect it would), your bottleneck is in the guts of the std::unordered_set, which is a hash-table, and the main potential performance problem with hash tables is when there are excessive hash collisions.

In your current implementation, if p1, p2 and p3 are small, the number of distinct hash codes will be small (since you "collapse" float to integer) and there will be lots of collisions.

If the above assumptions turn out to be true, I'd try to implement the hash function differently (e.g. multiply with much larger coefficients).

Other than that, profile your code, as others have already suggested.

Question 2

Hashing floating point can be tricky. In particular, your hash routine calculates the hash as a floating point value, then converts it to an unsigned integral type. This has serious problems if the vertices can be small: if all of the vertices are in the range [0...1.0), for example, your hash function will never return anything greater than 13. As an unsigned integer, which means that there will be at most 13 different hash codes.

The usual way to hash floating point is to hash the binary image, checking for the special cases first. (0.0 and -0.0 have different binary images, but must hash the same. And it's an open question what you do with NaNs.) For float this is particularly simple, since it usually has the same size as int, and you can reinterpret_cast:

size_t
hash( float f )
{
    assert( /* not a NaN */ );
    return f == 0.0 ? 0.0 : reinterpret_cast( unsigned& )( f );
}

I know, formally, this is undefined behavior. But if float and int have the same size, and unsigned has no trapping representations (the case on most general purpose machines today), then a compiler which gets this wrong is being intentionally obtuse.

You then use any combining algorithm to merge the three results; the one you use is as good as any other (in this case—it's not a good generic algorithm).

I might add that while some of the comments insist on profiling (and this is generally good advice), if you're taking 15 minutes for 3 million values, the problem can really only be a poor hash function, which results in lots of collisions. Nothing else will cause that bad of performance. And unless you're familiar with the internal implementation of std::unordered_set, the usual profiler output will probably not give you much information. On the other hand, std::unordered_set does have functions like bucket_count and bucket_size, which allow analysing the quality of the hash function. In your case, if you cannot create an unordered_set with 3 million entries, your first step should be to create a much smaller one, and use these functions to evaluate the quality of your hash code.

Question 3

If there is a bottleneck, you are definitely not seeing it, because you don't include any kind of timing measures.

Measure the timing of your algorithm, either with a profiler or just manually. This will let you find the bottleneck - if there is one.

This is the correct way to proceed. Expecting yourself, or alternatively, StackOverflow users to spot bottlenecks by eye inspection instead of actually measuring time in your program is, from my experience, the most common cause of failed attempts at optimization.