Question

Why should someone use a stream processing engine like Apache Spark, Flink, Hadoop instead of just a normal backend worker which works on something and returns the results as soon as it's done?

Credit card fraud checking example is given when we talk about these solution so what is the problem by just write a program and put it as backend service which does this for us and returns the result?

Was it helpful?

Solution

Spark and similar engines are built with processing massive amounts of data in mind. They enable you to run your calculation on dozens or hundreds of machines in parallell. This is not so easily implemented from scratch.

So, for data which can be processed on a single machine, you are probably better off by writing a piece of code completely by yourself, but when the data grows and cannot be processed on a single machine due to the amount of memory and CPU power required, Spark or another such engine, are a much better choice. It leaves to you writing your processing logic, but handles the "logistics" of spreading the load between machines and combining the partial results into a single result.

Licensed under: CC-BY-SA with attribution
scroll top