How to organize exchanges of non critical messages between servers in multiple data centers by using Kafka (or other solution)?

https://softwareengineering.stackexchange.com/questions/359936

22-01-2021
|

Question

We need to organize a way to exchange messages between servers in multiple data centers. Messages are not critical. We just need to be able to send messages from any servers in any data centers to any other servers in any data centers.

We are thinking to use Kafka as a message broker for this use case, but we are not sure if it is a good option.

As we understand Kafka won't work normally if the cluster is distributed between multiple data centers (because it has to work with Zookeeper which is more like a single data center solution). We think of using separate clusters in each data center. Each server has to have a connection to all Kafka clusters in all data centers. That is why any server can send a message to any other server in any data center.

So, each server will have n connections. Total number of all connections will be: n*m, where n - is the total number of data centers, and m is the total number of servers.

Is it possible to reduce the total number of connections to m? I.e. that all servers would be connected only to local Kafka clusters.

Solution

We have decided to use the architecture similar to the architecture which proposed Todd Palino (Staff Site Reliability Engineer from LinkedIn). I.e. the architecture they are using in LinkedIn.

We have decided to use a unique topic for each server. Topic name has to have a data center identifier, and a server identifier (in that data center).
So, for example, if we have 3 data centers with 3 nodes in each data center we will have next topics:
DC1_1, DC1_2, DC1_3;
DC2_1, DC2_2, DC2_3;
DC3_1, DC3_2, DC3_3;

The same topics have to be created in all clusters (in our situation 9 topics in each cluster). MirrorMaker is used in each data center to consume needed data for local cluster and produce those data in our local cluster.

So, our local clusters are aggregated clusters at the same time and we produce to our local clusters.
We have seen Todd Palino's presentation and hi said something like "NEVER produce to aggregate clusters". But his explanation was that your produced data won't be replicated to other aggregated clusters. But the thing is that we do not replicate local data (which are consumed by local consumers) in any other cluster. Only data which are related to the local cluster are produced from external clusters into the local cluster. So, in our situation, it is OK to produce into aggregated clusters.

The above example can be represented by the following architecture: As you see, API servers (it is our producers and consumers) produce and consume only from local Kafka clusters. Moreover, each consumer consumes only from the topic which has the same identifier as a consumer identifier (i.e. the API server DC1_1 will consume only from the topic DC1_1). As you can see each MirrorMaker consume data only for topics which must be consumed in a local data center (i.e. in DC1 consumed topics will be DC1_1, DC1_2, DC1_3). Those topics (local topics) are not replicated by other MirrorMaker instances, so we can publish and consume from these topics in our local cluster. Data storage for external topics can be relatively small (depends on your load).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange