Question

In a high volume real time java web app I'm sending messages to apache kafka. Currently I'm sending to a single topic, but in the future I might need to send messages to multiple topics.

In this case I'm not sure weather to create a producer per topic or should I use a single producer to all my topics?

Here is my code:

props = new Properties();
props.put("zk.connect", <zk-ip1>:<2181>,<zk-ip3>:<2181>,<zk-ip3>:<2181>);
props.put("zk.connectiontimeout.ms", "1000000");
props.put("producer.type", "async");

Producer<String, Message> producer = new kafka.javaapi.producer.Producer<String, Message>(new ProducerConfig(props));

ProducerData<String, Message> producerData1 = new ProducerData<String, Message>("someTopic1", messageTosend);
ProducerData<String, Message> producerData2 = new ProducerData<String, Message>("someTopic2", messageTosend);

producer.send(producerData1);
producer.send(producerData2);

As you can see, once the producer has been created I can use it to send data to different topics. I wonder what is the best practice? If my app sends to multiple topics (each topic gets different data) can/should I use a single producer or should I create multiple producers? When (generaly speaking) should I use more than a single producer?

Was it helpful?

Solution

In general, a single producer for all topics will be more network efficient.

If the kafka client sees more than one topic+partition on the same Kafka Node, it can send messages for both topic+partitions in a single message. Kafka optimizes for message batches so this is efficient.

In addition, your web servers only need to maintain at-most one tcp connection to each Kafka node, instead of one connection per producer, per node.

For more info on Kafka's design: https://kafka.apache.org/documentation.html#design

As you mention in comments, lock contention may become a limiting factor, YMMV.

OTHER TIPS

From Kafka: The Definitive Guide, in the Kafka Producers Chapter, the author says:

You will probably want to start with one producer and one thread. If you need better throughput, you can add more threads that use the same producer. Once this ceases to increase throughput, you can add more producers to the application to achieve even higher throughput.

So there might actually be benefits in having multiple producers.

We have verified in practice that having only one producer is optimal per topic. However, having multiple producers is useful if you encounter the long, fat network problem, in which case we must have multiple connections to fully utilize the network.

Batching and pipelining in a single TCP connection (as is used by Kafka) by itself will not scale to large batches if you must send to a host far away unless you do TCP Tuning to have large window sizes. This is the case when you might experiment with more producers.

In 0.8.2.0 and above if you are using same kafka producer for multiple topics then the default Partitioner logic for round robin assignment will fail.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top