Question

This is a question I came up with after reading: What is the "task" in Storm parallelism

If I need to keep some information in the bolt's internal state, for example, in the classical word counting use case, keeping the count of each word seen in the bolt in a hashmap. After executing "rebalance" command, the task of the bolt many be moved to another executor, which may be in another JVM or even another machine. Will the bolt's internal state (the word count hashmap in this example) be transferred to the new environment (instance/JVM/machine)?

Of course putting the word count hashmap in a central place such as Zookeeper won't have this problem. But for performance sake, it seem we need to keep things in memory sometimes.

Was it helpful?

Solution 2

In storm we have the following mapping

storm          |     real world
+++++++++++++++++++++
Worker       |    Java Process
Executor    |    Thread
Task          |    Running Object Methods (execute and so on)

Since there is no shared memory between processes (and machines too) in storm, you don't have shared values such as counters spread out in all bolts in your application. but you can have shared values in executors (threads) in ONE worker.
To dealing with shared values in all executors, you must use other tools such as distributed caches (Memcached, guava) or databases.
In storm i think it uses zookeeper to restore states after re-balance.

OTHER TIPS

Once you run a rebalance the following will happen

  1. It will first deactivate the current topology
  2. It will then distribute the workers evenly within the cluster
  3. The topology will then return to its previous state of activation

Here is a comment by Nathan Marz which should help clear your doubts.

Rebalance is equivalent to those workers being killed and being created from scratch on another machine. If you want "state" to be maintained, I suggest you use something like Trident and keep your state synced on a DFS

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top