Question

I am using the kafka spout for consuming messages. But in case if I have to change topology and upload then will it resume from the old message or start from the new message? Kafka spout gives us to specity the timestamp from where to consume but how will I know the timestamp?

Was it helpful?

Solution 2

If you are using KafkaSpout ensure the following:

  1. In your SpoutConfig “id” and “ zkroot" do NOT change after redeploying the new version of the topology. Storm uses the“ zkroot”, “id” to store the topic offset into zookeeper
  2. KafkaConfig.forceFromStart is set to false.

KafkaSpout stores the offsets into zookeeper. Be very careful during the re-deployment if you set forceFromStart to true ( which can be the case when you first deploy the topology) in KafkaConfig of the KafkaSpout it will ignore stored zookeeper offsets. Make sure you set it to false.

Consider writing your topology so that the KafkaConfig.forceFromStart value is read from a properties file when your Topology’s main() method executes. This will allow your administrators to control whether the Kafka messages are replayed or not.

OTHER TIPS

spoutConfig.forceStartOffsetTime(-1);

It will choose the latest offset written around that timestamp to start consuming. You can force the spout to always start from the latest offset by passing in -1, and you can force it to start from the earliest offset by passing in -2.

references

Basically the sequence of events will be:

  1. First time start the topology by reading from beginning with below properties:

    forceFromStart = true
    
    startOffsetTime = -2
    

The above props will force it to start from the beginning of the topic. Remember to have both properties because forceFromStart tells storm to read the startOffsetTime property and use the value that is set to determine from where to start reading, and ignore zookeeper offset.

From now on your topology will run and zookeeper will maintain the offset. If your worker dies, it will start be started by supervisor and start reading from the offset in zookeeper.

  1. Now if you want to restart your topology and you want to read from where it was left off before shutdown, use below property and restart the topology:

    forceFromStart = false
    

By the above property, you are telling storm not the read the startOffsetTime value instead use the zookeeper offset which has been maintained before you shutdown your topology.

From now on every time you restart the topology, it will read from where it was left.

  1. If you want to restart your topology and you want to read from the head/top of the topic, use below property and restart topology:

    forceFromStart = true
    
    startOffsetTime = -1
    

By above property you are telling storm to ignore the zookeeper offset and start from the latest offset that is the tip of the topic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top