Pregunta

I've been working on a project of mine using Akka to create a real-time processing system which takes in the Twitter stream (for now) and uses actors to process said messages in various ways. I've been reading about similar architectures that others have built using Akka and this particular blog post caught my eye:

http://blog.goconspire.com/post/64901258135/akka-at-conspire-part-5-the-importance-of-pulling

Here they explain different issues that arise when pushing work (ie. messages) to actors vs. having the actors pull work. To paraphrase the article, by pushing messages there is no built-in way to know which units of work were received by which worker, nor can that be reliably tracked. In addition, if a worker suddenly receives a large number of messages where each message is quite large you might end up overwhelmed and the machine could run out of memory. Or, if the processing is CPU intensive you could render your node unresponsive due to CPU thrashing. Furthermore, if the jvm crashes, you will lose all the messages that the actor(s) had in its mailbox.

Pulling messages largely eliminates these problems. Since a specific actor must pull work from a coordinator, the coordinator always knows which unit of work each worker has; if a worker dies, the coordinator knows which unit of work to re-process. Messages also don’t sit in the workers’ mailboxes (since it's pulling a single message and processing it before pulling another one) so the loss of those mailboxes if the actor crashes isn't an issue. Furthermore, since each worker will only request more work once it completes its current task, there are no concerns about a worker receiving or starting more work than it can handle concurrently. Obviously there are also issues with this solution like what happens when the coordinator itself crashes but for now let's assume this is a non-issue. More about this pulling pattern can also be found at the "Let It Crash" website which the blog references:

http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2

This got me thinking about a possible alternative to doing this pulling pattern which is to do pushing but with durable mailboxes. An example I was thinking of was implementing a mailbox that used RabbitMQ (other data stores like Redis, MongoDB, Kafka, etc would also work here) and then having each router of actors (all of which would be used for the same purpose) share the same message queue (or the same DB/collection/etc...depending on the data store used). In other words each router would have its own queue in RabbitMQ serving as a mailbox. This way, if one of the routees goes down, those that are still up can simply keep retrieving from RabbitMQ without too much worry that the queue will overflow since they are no longer using typical in-memory mailboxes. Also since their mailbox isn't implemented in-memory, if a routee crashes, the most messages that it could lose would just be the single one it was processing before the crash. If the whole router goes down then you could expect RabbitMQ (or whatever data store is being used) to handle an increased load until the router is able to recover and start processing messages again.

In terms of durable mailboxes, it seems that back in version 2.0, Akka was gravitating towards supporting these more actively since they had implemented a few that could work with MongoDB, ZooKeeper, etc. However, it seems that for whatever reason they abandoned the idea at some point since the latest version (2.3.2 as of the writing of this post) makes no mention of them. You're still able to implement your own mailbox by implementing the MessageQueue interface which gives you methods like enqueue(), dequeue(), etc... so making one that works with RabbitMQ, MongoDB, Redis, etc wouldn't seem to be a problem.

Anyways, just wanted to get your guys' and gals' thoughts on this. Does this seem like a viable alternative to doing pulling?

¿Fue útil?

Solución

This question also spawned a rather long and informative thread on akka-user. In summary it is best to explicitly manage the work items to be processed by a (persistent) actor from which a variable number of worker actors pull new jobs, since that allows better resource management and explicit control over what gets processed and how retries are handled.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top