Pergunta

A big data job is split up into X partitions. The partitions are stored in a database. Status on each partition is also stored in the database and is used to ensure that each partition is only processed once by a single server.

I've got X servers, each with a unique id (int), each polling the database for the next Y partitions (pre-read and buffer, then loop and process the pre-read partitions, continue until no more partitions remain).

I can see in the log that I get many clashes, eg multiple servers trying to process the same partition and failing, when trying to take ownership (as it has already been taken by another server)

All these fails are waste of time, network round trips and compute power.

I'm looking for ideas on how to split the partitions among the servers when reading the partitions.

Each partition has the following attributes:

  • Id - string[13]
  • Sequence - long (incrementing counter)
  • Create time - timestamp

Any ideas on how to best implement a non-clashing read algorithm ?

Keep in mind:

  • Number of partitions are unknown
  • Number of servers are known, but may increase/decrease
  • I can modify/add attributes to the partition if they can help minimize clashes
  • X Partition should not have affinity to Y Server, any server should be able to process any partition

My Idea: I've been playing around with the idea of using the server id to offset their read, eg server 1 reads 0-1000 records, server 2 reads 1001-2000 records and so forth, however too many issues occur, there might not be partitions enough to divide on X servers, or the servers may be started at different times reading the same partitions even with an offset according to their server id.

Foi útil?

Solução

Use a hash mod number of servers. Check if it equals your server id. Only look at other partitions when there is no work for yours. You will get some collisions but only when you've run out of work.

Outras dicas

I've achieved this sort of distributed workload by using JMS like so:

  1. Primary program reads source data and composes individual "messages" of work and posts them to a JMS queue
  2. Worker servers read messages from the queue (this can be wrapped in transaction handling if desired) one by one until the queue is empty.

No need to partition the work, since any work unit can be handled by any server... just compose the work units at the desired granularity and let the JMS server handle distribution to your worker servers.

Note: The "J" in JMS stands for Java, but really there are JMS clients for many languages.

Licenciado em: CC-BY-SA com atribuição
scroll top