The implicit meaning of nextTuple()/ack()/fail()
methods are called on the same thread is, the task (background Java thread), running at machine 'A', which emits the tuple is the same task, running at 'A' on which the ack()/fail() is called depending on the success/failure of processing (processed by Bolt running at 'B'or 'C') the tuple in the topology.
As long as the messageId is not null and Bolt tasks are calling the ack(tuple) in the execute() method, Storm framework keeps track of tuple traversal within the topology and call the ack()/fail() of tuple's owning task.
Here is the brief introduction on how the background task thread works before answering your question. The background task thread has in-memory structure/buffer for the emitted tuple and few other in-memory structures for status/pending tuples etc. The buffer gets filling up as the Spout/Bolt starts emitting the data and this buffer getting freed up as and when the tuples are processed i.e after calling ack()/fail(). Essentially, the background thread calls nextTuple()
when the buffer is free and background thread stops calling the nextTuple()
once the buffer is full. In simple words, emit() method either in the open()/nextTuple()/close()
, fills the background thread buffer and ack()/fail()
frees up the buffer.
With the above explanation, the background thread is unaware of the new/incoming data. It's up to the logic within the nextTuple() to read the data from source(Twitter/JMS providers/ESB/AMQP compliant servers/RDBMS) and emit the data. So, depending on the background thread's buffer size, Storm calls nextTuple() as explained above.
For other question, it should be ok to sleep for short duration if it's required. Please note, the nextTuple()
need not emit the value, it can return with nothing.