Pergunta

I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int) method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash() for simple session affinity to efficiently distribute load across a set of backend servers.

Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.

For example:

@Test
public void testConsistentHash() {
    List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");

    int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());
    System.out.println("First time routed to: " + servers.get(bucket));

    // one of the back end servers is removed from the (middle of the) pool
    servers.remove(1);

    bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());
    System.out.println("Second time routed to: " + servers.get(bucket));
}

Leads to the output:

First time routed to: server4
Second time routed to: server5

What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.

Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets to 99 and bucket 99 gets distributed across the remaining 99 buckets.

Foi útil?

Solução

I'm afraid that no data structure can do it really right with the current consistentHash. As the method accepts only the list size, nothing but appending and removal from the end can be supported. Currently, the best solution consist probably of replacing

servers.remove(n)

by

server.set(n, servers.get(servers.size() - 1);
servers.remove(servers.size() - 1);

This way you sort of swap the failed and the very last server. This looks bad as it makes the assignments to the two swapped servers wrong. This problem is only half as bad as one of them have failed. But it makes sense, as after the following removal of the last list element, everything's fine, except for the assignments to the failed server and to the previously last server.

So twice as much assignments as needed change. Not optimal, but hopefully usable?

Outras dicas

I don't think there's a good way to do this at the moment. consistentHash in its current form is useful only in simple cases -- basically, where you have a knob to increase or decrease the number of servers... but always by adding and removing at the end.

There's some work underway to add a class like this:

public final class WeightedConsistentHash<B, I> {
  /** Initially, all buckets have weight zero. */
  public static <B, I> WeightedConsistentHash<B, I> create(
      Funnel<B> bucketFunnel, Funnel<I> inputFunnel);

  /**
   * Sets the weight of bucket "bucketId" to "weight".
   * Requires "weight" >= 0.0.
   */
  public void setBucketWeight(B bucketId, double weight);

  /**
   * Returns the bucket id that "input" maps to.
   * Requires that at least one bucket has a non-zero weight.
   */
  public B hash(I input);
}

Then you would write:

WeightedConsistentHash<String, String> serverChooser =
    WeightedConsistentHash.create(stringFunnel(), stringFunnel());
serverChooser.setBucketWeight("server1", 1);
serverChooser.setBucketWeight("server2", 1);
// etc.

System.out.println("First time routed to: " + serverChooser.hash("someId"));

// one of the back end servers is removed from the (middle of the) pool
serverChooser.setBucketWeight("server2", 0);

System.out.println("Second time routed to: " + serverChooser.hash("someId"));

And you should get the same server each time. Does that API look suitable?

The guava API does not have any knowledge of your server list. It can only guarantee this:

int bucket1 = Hashing.consistentHash(Hashing.md5().hashString("server1"),N);    
int bucket2 = Hashing.consistentHash(Hashing.md5().hashString("server1"),N-1);

assertThat(bucket1,is(equalTo(bucket2))); iff bucket1==bucket2!=N-1 

you need to manange the bucket to your server list yourself

The answer proposed in the question is the correct one:

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition?

Guava is hashing into a ring with ordinal numbers. The mapping from those ordinal numbers to the server ids has to be maintained externally:

  • Given N servers initially - one can choose some arbitrary mapping for each ordinal number 0..N-1 to server-ids A..K (0->A, 1->B, .., N-1->K). A reverse mapping from the server id to it's ordinal number is also required (A->0, B->1, ..).

  • On the deletion of a server - the ordinal number space shrinks by one. All the ordinal numbers starting with the one for the deleted server need to be remapped to the next server (shift by one).

    So for example, after the initial mapping, say server C (corresponding to ordinal number 2) got deleted. Now the new mappings become: (0->A, 1->B, 2->D, 3->E, .., N-2->K)

  • On the addition of a server L (say going from N to N+1 servers) - a new mapping can be added from N->L.

What we are doing here is mimicking how nodes would move in a ring as they are added and deleted. While the ordering of the nodes remains the same - their ordinal numbers (on which Guava operates) can change as nodes come and leave.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top