Reliable network all-to-all communication: Message Bus or Publish/Subscribe or Multicast/PGM?

Question

Pub / Sub is implemented as multicast, the use of PGM as a method of transport is transparent to a ZeroMQ client. ZeroMQ works hard to hide implementation details from the client, which is one of the reasons why it works so well. But don't let the simplicity fool you, ZeroMQ is an expertly engineered messaging solution that is very powerful and flexible. It is fast, efficient, and can handle really huge numbers of messages with ease.

ZeroMQ is used in some impressively large deployments, at my workplace we use ZeroMQ to handle over 200k messages per second. We have found ZeroMQ to scale effortlessly, the library is well engineered and optimised (no memory leaks, good performance), and has proven itself to work very well regardless of what we throw at it.

In ZeroMQ, publish / subscribe is done on user-defined topics, which dictate what data is sent to which connected clients. So if I have 10 clients connected to my publish socket, and 9 subscribe to a topic called "A", and 1 client subscribes to a topic "B", and I send a message with the topic of "B", then only the client subscribed to the topic "B" will be sent the message. ZeroMQ performs the filtering of pub/sub messages at both the point of transmission (to avoid wasting bandwidth), and at the point of receipt (to avoid possible race conditions when a topic is unsubscribed). It is also possible to subscribe to more than one topic.

To implement the mesh messaging system you describe, I recommend creating two sockets on each node in the cluster, one for receiving messages from all the other nodes, and one for sending messages to all the other nodes. If you do not need topics, then a subscription to the "*" topic will allow that client to receive all messages.