MQ to process, aggregate and publish data asynchronously

Question 1

After implementing it, I feel like answering my own question can be good for people that will come and visit StackOverflow in the future.

In the end, I went with Redis. It is really fast, and scalable. And I like its flexibility a lot: it is much more flexible than message queues. Am I asserting that Redis is better at MQs than the various MQs out there? Well, in my specific case I believe so. The point is: if something is not offered out-of-the-box, you can build it (usually, using MULTI - but you can even use LUA for more advance customization!).

For example, I followed this good answer to implement a "persistent", recoverable pub/sub (i.e. a pub/sub that allows clients to die and reconnect without losing messages).

This helped me with both my scalability and my "reliability" requirements: I decided to keep every piece in the pipeline independent (a deamon for now), but add a monitor which examines lists/queues on Redis; if something is not consumed (or consumed too slowly), the monitor spawns a new consumer. I am also thinking to be truly "elastic", and add the ability for consumers to kill themselves when there is no work to be done.

Another example: execution of scheduled activities. I am following this approach, which seems quite popular, for now. But I am eager to try keyspace notifications, to see if a combination of expiring keys and notifications can be a superior approach.

Finally, as a library to access Redis, my choice went to Jedis: it is popular, supported, and provides a nice interface to implement pub/sub as listeners. It is not the best approach (idiomatic) with Scala, but it works well.

Question 2

The queue task descriptions partially sound like things systems based on "enterprise integration patterns" like Apache Camel do.

A delayed message can be expressed by constants

from("seda:b").delay(1000).to("mock:result");

or variables, for example a message header value

from("seda:a").delay().header("MyDelay").to("mock:result");

Question 3

1> I suggest using a message queue, choose the queue depending on your requirements, but for most cases any one would do, I suggest you choose a queue based on protocol JMS (active mq) or AMQP (rabbit mq) and write a simple wrapper over it or use the ones provided by spring- > spring-jms or spring-amqp

2> You can write queue consumers such that they notify your system that a new message arrives for example in rabbit you can implement the MessageListener interface

 public class MyListener implements MessageListener {
     @Override
public void onMessage(Message message) {
     /* Handle the message */        

    }
}

3> If you use async consumers like in <2> you can get rid of all polling and cron jobs

4> Depends on your requirements -> If you have millions of events/messages passing through your queue then running the queue middle-ware on a centralized server makes sense.

5> If resource consumption is not an issue then keeping your consumers/subscribers running all the while is the easiest way to go. if these consumers are distributed then you can orchestrate them using a service like zookeeper

6> Scalability -> Most queuing systems provide for easy distribution of messages, so provided that your consumers are stateless, then scaling is possible just by adding new consumers and some configuration.

MQ to process, aggregate and publish data asynchronously

The question(s)