Вопрос

The ISpout.nextTuple() javadoc specifies that nextTuple(), ack(...) and fail(...) are called on the same thread.

However, the actual collector upon which emit(...) is called is supplied earlier, as a parameter on open(..., collector).

Question is whether a background thread that sees some new data must always enqueue the data for nextTuple() to dequeue and emit. What would happen if the background thread emits the data immediately? Is that supported? If that is allowed, what's the recommended way to implement the "sleep for a short amount of time" in nextTuple()?

Это было полезно?

Решение

The implicit meaning of nextTuple()/ack()/fail() methods are called on the same thread is, the task (background Java thread), running at machine 'A', which emits the tuple is the same task, running at 'A' on which the ack()/fail() is called depending on the success/failure of processing (processed by Bolt running at 'B'or 'C') the tuple in the topology.

As long as the messageId is not null and Bolt tasks are calling the ack(tuple) in the execute() method, Storm framework keeps track of tuple traversal within the topology and call the ack()/fail() of tuple's owning task.

Here is the brief introduction on how the background task thread works before answering your question. The background task thread has in-memory structure/buffer for the emitted tuple and few other in-memory structures for status/pending tuples etc. The buffer gets filling up as the Spout/Bolt starts emitting the data and this buffer getting freed up as and when the tuples are processed i.e after calling ack()/fail(). Essentially, the background thread calls nextTuple() when the buffer is free and background thread stops calling the nextTuple() once the buffer is full. In simple words, emit() method either in the open()/nextTuple()/close(), fills the background thread buffer and ack()/fail() frees up the buffer.

With the above explanation, the background thread is unaware of the new/incoming data. It's up to the logic within the nextTuple() to read the data from source(Twitter/JMS providers/ESB/AMQP compliant servers/RDBMS) and emit the data. So, depending on the background thread's buffer size, Storm calls nextTuple() as explained above.

For other question, it should be ok to sleep for short duration if it's required. Please note, the nextTuple() need not emit the value, it can return with nothing.

Другие советы

It is my understanding that you shouldn't emit data unless requested by Storm by calling your nextTuple() method. Consequently, your background thread must enqueue new data, so that it is emitted when requested. Your nextTuple() method should sleep briefly only if there are no tuples to emit when the method is called.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top