Question

I'm trying to get some feedback on the recommendations for a service 'roster' in my specific application. I have a server app that maintains persistant socket connections with clients. I want to further develop the server to support distributed instances. Server "A" would need to be able to broadcast data to the other online server instances. Same goes for all other active instances.

Options I am trying to research:

  1. Redis / Zookeeper / Doozer - Each server instance would register itself to the configuration server, and all connected servers would receive configuration updates as it changes. What then?
    1. Maintain end-to-end connections with each server instance and iterate over the list with each outgoing data?
    2. Some custom UDP multicast, but I would need to roll my own added reliability on top of it.
  2. Custom message broker - A service that runs and maintains a registry as each server connects and informs it. Maintains a connection with each server to accept data and re-broadcast it to the other servers.
  3. Some reliable UDP multicast transport where each server instance just broadcasts directly and no roster is maintained.

Here are my concerns:

  • I would love to avoid relying on external apps, like zookeeper or doozer but I would use them obviously if its the best solution
  • With a custom message broker, I wouldnt want it to become a bottleneck is throughput. Which would mean I might have to also be able to run multiple message brokers and use a load balancer when scaling?
  • multicast doesnt require any external processes if I manage to roll my own, but otherwise I would need to maybe use ZMQ, which again puts me in the situation of depends.

I realize that I am also talking about message delivery, but it goes hand in hand with the solution I go with. By the way, my server is written in Go. Any ideas on a best recommended way to maintain scalability?

* EDIT of goal *

What I am really asking is what is the best way to implement broadcasting data between instances of a distributed server given the following:

  1. Each server instance maintains persistent TCP socket connections with its remote clients and passes messages between them.
  2. Messages need to be able to be broadcasted to the other running instances so they can be delivered to relavant client connections.
  3. Low latency is important because the messaging can be high speed.
  4. Sequence and reliability is important.

* Updated Question Summary *

If you have multiple servers / multiple end points that need to pub/sub between each other, what is a recommended mode of communication between them? One or more message brokers to re-pub messages to a roster of the discovered servers? Reliable multicast directly from each server? How do you connect multiple end points in a distributed system while keeping latency low, speed high, and delivery reliable?

Was it helpful?

Solution

Assuming all of your client-facing endpoints are on the same LAN (which they can be for the first reasonable step in scaling), reliable UDP multicast would allow you to send published messages directly from the publishing endpoint to any of the endpoints who have clients subscribed to the channel. This also satisfies the low-latency requirement much better than proxying data through a persistent storage layer.

Multicast groups

  • A central database (say, Redis) could track a map of multicast groups (IP:PORT) <--> channels.
  • When an endpoint receives a new client with a new channel to subscribe, it can ask the database for the channel's multicast address and join the multicast group.

Reliable UDP multicast

  • When an endpoint receives a published message for a channel, it sends the message to that channel's multicast socket.
  • Message packets will contain ordered identifiers per server per multicast group. If an endpoint receives a message without receiving the previous message from a server, it will send a "not acknowledged" message for any messages it missed back to the publishing server.
  • The publishing server tracks a list of recent messages, and resends NAK'd messages.
  • To handle the edge case of a server sending only one message and having it fail to reach a server, server can send a packet count to the multicast group over the lifetime of their NAK queue: "I've sent 24 messages", giving other servers a chance to NAK previous messages.

You might want to just implement PGM.

Persistent storage

If you do end up storing data long-term, storage services can join the multicast groups just like endpoints... but store the messages in a database instead of sending them to clients.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top