I had come across a discussion in storm-user which discuss something similar.
Read Relationship between Spout parallelism and number of kafka partitions.
2 things to note while using kafka-spout for storm
- The maximum parallelism you can have on a KafkaSpout is the number of partitions.
- We can split the load into multiple kafka topics and have separate spout instances for each. ie. each spout handling a separate topic.
So if we have a case where kafka partitions per host is configured as 1 and the number of hosts is 2. Even if we set the spout parallelism as 10, the max value which is repected will only be 2 which is the number of partitions.
How To mention the number of partition in the Kafka-spout?
List<HostPort> hosts = new ArrayList<HostPort>();
hosts.add(new HostPort("localhost",9092));
SpoutConfig objConfig=new SpoutConfig(new KafkaConfig.StaticHosts(hosts, 4), "spoutCaliber", "/kafkastorm", "discovery");
As you can see, here brokers can be added using hosts.add
and the partion number is specified as 4 in the new KafkaConfig.StaticHosts(hosts, 4)
code snippet.
How To mention the parallelism hint in the Kafka-spout?
builder.setSpout("spout", spout,4);
You can mention the same while adding your spout into the topology using setSpout
method. Here 4 is the parallelism hint.
More links that might help
Understanding-the-parallelism-of-a-Storm-topology
what-is-the-task-in-twitter-storm-parallelism
Disclaimer: !! i am new to both storm and java !!!! So pls edit/add if its required some where.