Question

If I'm running a Kafka cluster with more partitions than my lone consumer group has consumers. Are there any guarantees made on ordering of messages, or on-time delivery of messages across partitions?

Simple example:
2 Partitions, 1 Consumer
The Producers are controlling Partition assignment via a key.
Message 1 comes in and goes to Partition A
Message 2 comes in and goes to Partition B
Message 3 comes in and goes to Partition A

I know Message 1 will be consumed before Message 3, because they are in the same partition. But what about Message 2? Will it be consumed before Message 3 or after? Or could it vary? Could it possibly be consumed before Message 1?

Moreover, what if new Messages continue to come in for Partition A and the production is faster than consumption? Will Message 2 sit in Partition B indefinitely? When will it be consumed? Are there any guarantees that the messages will not sit there forever?

More generally: If a consumer is assigned to multiple partitions, how and when does that consumer swap between those partitions?

Was it helpful?

Solution

Ordering guarantees

Kafka provides ordering guarantees only within a partition. In your example, Message 2 might be consumed either before Message 1, after Message 1 or after Message 3. That's only depends on the performance of the consumer. More information on this is available in the documentation: https://kafka.apache.org/documentation.html#introduction ('Consumers' and 'Guarantees' topics).

Slow consumption

Kafka broker is not aware of the consumers. It stores the messages in log segments until corresponding log segment gets deleted. Consumers may attach to the broker at any moment and start consumption from the oldest log segment. Minimum message retention time is controlled by two configuration properties: log.retention.hours and log.retention.bytes (with possible overrides per topic). More on this in documentation: https://kafka.apache.org/documentation.html#brokerconfigs.

Answering your question: if the consumer eventually gets slower than producer, it has some time to catch up (1 week by default). If it doesn't, some non-consumed messages will be deleted forever.

Consuming multiple partitions

High-level consumer creates several KafkaStream objects, each providing data from one or multiple partitions. It's up to you how to consume these streams: in separate threads, round robin, etc. It's also possible to fetch timestamps of messages and merge the streams into a single stream restoring message order.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top