Pergunta

I have been using the following code for a couple years now to handle lookups on our database tables. We currently shard our tables across 6 hosts. The lookup code is:

$db_servers = Array(
    'db-1','db-2','db-3'
    //you get the idea
);
$full_table = 'mydatabase.mytable'; //just an example...obviously
$hash = sprintf('%u', crc32($full_table));
$host = $db_servers[($hash % $db_server_count)];

This "algorithm" has the benefits of being fast and pretty random. However, whenever I add a new node to the cluster, the rebalancing takes a fair amount of time as there seem to be an unnecessary amount of tables that have to be moved onto different hosts. It's not a huge issue as I was able to build the rebalancing script so there is no downtime as the rebalancing occurs. Rather, there is just a small performance penalty until it is completed.

My question is if there are any other algorithms of accomplishing this form of consistent hashing without copious amounts of rebalancing when new hosts are added? I'm continuing to research this topic, but thought Stack Overflow would have some clever solutions that people have seen work well in production.

Foi útil?

Solução

Ok, I found a PHP class called Flexihash that handles this nicely. Here's a blog post about it: http://paul.annesley.cc/2008/04/flexihash-consistent-hashing-php/

Additionally, you can take a look at the Github repo here: https://github.com/pda/flexihash

Here's the way my code looks now for anyone that stumbles upon this thread later.

$db_servers = Array(
    'db-1','db-2','db-3'
    //you get the idea
);
$full_table = 'mydatabase.mytable'; //just an example...obviously
$Flexihash = new Flexihash(null, 8);
//I played around with different replica counts and settled on 8
$Flexihash->addTargets($db_servers);
$host = $Flexihash->lookup($full_table);
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top