Frage

I'm working with Storm and it is fine for a lot of use cases. Recently I had a look at Trident, which is a high-level abstraction of Storm. It supports exactly-once processing and makes stateful processing easier.

But now I'm wondering.. Why can't I always use Trident instead of Storm?

What I read so far:

  • Trident processes messages in batches, so throughput time could be longer.
  • Trident is not yet able to process loops in topologies.

Are there any other disadvantages when using Trident instead of Storm? Because right now, I think the disadvantages I listed above are marginal.

What use cases cannot be implemented with Trident?


Aftermath:

Since I asked the question my company decided to go for Trident first. We will only use pure Storm when there are performance problems. Sadly this wasn't an active decision it just became the default behavior (I wasn't around at that time).

Their assumption was that in most use cases we need state or only-once-processing or we will need it in near future. I understand their reasoning because moving from Storm to Trident or back isn't an easy transformation, but in my personal opinion the concept of stream processing without state wasn't understood by all and that was the main reason to use Trident.

War es hilfreich?

Lösung

To answer your question: when shouldn't you use Trident? Whenever you can afford not to.

Trident adds complexity to a Storm topology, lowers performance and generates state. Ask yourself the question: do you need the "exactly once" processing semantics of Trident or can you live with the "at least once" processing semantics of Storm. For exactly once, use Trident, otherwise don't.

I would also just like to highlight the fact that Storm guarantees that all messages will be processed. Some messages might just be processed more than once.

Andere Tipps

If the lowest possible latency is your goal and you don't need exactly-once processing, then using Storm is better than Trident.

Trident is a high-level abstraction for doing realtime computing on top of Twitter Storm, available in Storm 0.8.x. Storm is stateless stream processing framework and Trident provides stateful stream processing.

Chris, since these two of them are open source technologies, trident serves as an only an implementation of a scenario on top of the storm, of course, this brought a performance overhead. If the trident could not meet your requirements, you create your own state implementation on top of the storm. Trident yielded higher level projects such as Trident-ML in time.

assume we want to do filtering + addition of a field to a tuple. if we use storm usually we use 2 bots for filtering , addition of field. so again we need to send the tuple to new bolt by may be using global grouping. so here nw bandwidth may become bottleneck.

by using trident we can use do above on a single machine. so no regrouping is needed in this case. such use case in addition to "exactly once" /"at east once" can differentiate what to use etc.

Trident is kind of grouping logical grouping

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top