For a similar application as you, I've had success with an open hash scheme.
A "closed hash" maintains a list of objects that map to each hash value. Collisions result in list growth, but the lists are distinct heap objects with poor cache locality.
An open hash goes all into the same array. Good for the CPU caches.
For extra performance, use a "perfect hash"-like function which avoids scrambling data and minimizes randomness. Instead try to find and preserve temporal locality of items that are accessed close together, mapping it to spatial locality.
Such an optimized hash does still need to be uniform over its range, though. I used a pre-analysis step, randomly sampling the domain to compute the hash function. Addition of such complexity needs to be driven by knowing exactly how much time is being spent on those cache misses in the first place.