Do any hashtables (in-memory, non-distributed) use consistent hashing?

Question

I haven't heard of this in the wild, but it may be a good idea if you choose the right consistent hash implementation. Specifically, Jump Consistent Hashing by Google et al. First I'll go into why Jump, then I'll go into how it can be useful in a local data structure.

Jump Consistent Hashing

Jump Consistent Hashing (which I'll shorten to Jump) is great for this space for a few reasons. Jump assumes that nodes don't fail, which is great for local data structures because they, well, don't fail! This allows Jump to merely be a mapping to a range of numbers [0, numBuckets), requiring only 2-4 bytes of space.

Further the implementation is simple and fast. And it is even faster if we remove the reference implementation's floating point divides and replace them with half as many integer divides. (Which we can, by the way.)

All this can be used for a variation on...

ConcurrentHashMap

But first, Java's Concurrent Hash Map at a high-level.

Java's ConcurrentHashMap is parameterized by a number of buckets. This sharding factor is constant through the life of the map. Each of these buckets is itself a hash map with its own lock.

When inserting a key-value pair into the map, the key is hashed into one of the buckets. The lock for that key is taken, and the item is inserted into the bucket's hash map before releasing the lock. Whilst inserting into bucket x another thread can be inserting concurrently into bucket y, but it will wait for the lock if inserting into bucket x. Thus Java's ConcurrentHashMap has n-way concurrency, where n is the bucket parameter of the constructor.

Just like any hash map, a bucket in ConcurrentHashMap can fill up and need to grow. Just like the regular hash map it does this by doubling its size and rehashing everything in the bucket back into its bigger self. Except that 'its bigger self' is only the bucket's 'self'. If a bucket is a hot spot and gets more than its fair share of keys, the bucket will grow disproportionately compared to the other buckets. And each time a bucket grows it takes longer and longer to rehash into itself. This last point is not only a problem for hot spots, but when the hash table plain old gets more keys.

Imagine if we could grow the number of buckets as the number of keys grows. With this we could dampen the amount of growth each individual bucket grows.

Enter consistent hashing, which allows us to add more buckets!

ConcurrentHashMap take 2: Consistent Hashing Style

We can get ConcurrentHashMap to grow its number of buckets in a two easy steps.

First replace the function that maps to each bucket with the jump consistent hash function. So far everything should work the the same.

Second split off a new bucket when a bucket is filled; also grow the filled bucket. Actually, only split off a new bucket if the filled bucket becomes the largest biggest in terms of capacity. That can be calculated without iterating the buckets.

With consistent hashing the split will only direct keys into the new bucket and not backwards into any of the old buckets.

End notes

I'm sure there can be improvements on this scheme. To wit, splitting off a bucket requires a full table scan to move keys into the new bucket. This is surely no worse than a vanilla hash map, and likely better, but it is at a disadvantage to the ConcurrentHashMap implementation which likely doesn't have to do a full scan.