Is Kafka needed in a realtime chat application?

https://softwareengineering.stackexchange.com/questions/422177

22-03-2021
|

Pregunta

I'm developing a realtime chat application with an Angular frontend and Java backend. I've found a couple of examples that resemble what I am trying to achieve, such as:

It seems common to include Kafka as a message broker, but I am trying to understand why we would want it. In my messaging application I don't foresee the server publishing anything towards the users. It will always be an end user on the clientside publishing something to a websocket endpoint, the server storing that information in the database and then delegating the message to the correct recipient(s), like so:

Now let's try to think away the Kafka section in that diagram and persist the message directly in the database. What do I lose out on except that the message gets stored in the database synchronously instead of asynchronously? I can simply put the message on /queue/message before persisting the message so that there are no latency issues between the users. In the examples I've seen, persistence was not really a part of the flow so I figured there may be another reason to using Kafka.

Solución

People think of kafka as a message broker, but it's also sort of a database that stores and retrieves messages in order, and tracks every consumer's place in that list. This is an extremely nice fit to store something like chat messages, especially if the websocket box on your diagram is actually multiple websockets on multiple machines, where the bottom putMessage arrow in that box goes through kafka as well.

In a typical kafka application, the DB in your diagram would be more for aggregate purposes like search or analytics, not for queries like "get me the new messages since the last time I polled."

Otros consejos

If you have 1 server doing everything, there's really no reason to use Kafka at all.

If you distribute, which you probably have to if you want to scale to millions of users, other aspects come in. Clients who speak may not speak to the node where the recipient is. In this case, you'll have to route messages and/or do a database polling on each node to look for messages.

So in this case, Kafka solves 3 problems for you:

It can route really well and easy (if you know what you're doing)
It will handle nodes crashing / coming online well
You can poll Kafka directly, which is intended, instead of polling a database, which may or may not work well.

Additionally, if you're really scaling:

Kafka doesn't wait for I/O. Conventional databases are usually limited by iops (I/O per second) because of transaction boundaries. This can be really slow. Like 100s of messages (per disk) vs. millions in Kafka.

That's just a couple of things, there could be more...

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange