Question

Apache Flume User Guide says spooling directory source may duplicate events under certain circumstances. Here is the line from docs: "Despite the reliability guarantees of this source, there are still cases in which events may be duplicated if certain downstream failures occur."

What are those cases? Especially if we are using durable channel like file channel i dont see any reason to duplicate events occur?

Was it helpful?

Solution

You described the file channel as durable. You didn't say once and only once delivery.

Flume's guarantee is at least once delivery. Batches will be resent on failure. This can lead to duplicate events.

Example: node 1 is sending events to node 2. All the events are sent and node 2 acknowledges receipt. However. Network conditions are such that that message is lost. Node 2 has stored the batch but node 1 will resend it. Thus, duplicate events.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top