Pergunta

I'm trying to get past all the hipster, pie-in-the-sky buzzwording and address a very simple, fundamental question:

What is a streaming application?

According to the Kafka site

"Kafka is used for building real-time data pipelines and streaming apps"

Streaming apps...hmmm. OK, so what is a "streaming app"?! According to Quora, a Java stream is:

[A sequence] of bytes that you can read from (InputStream and its subclasses) or write to (OutputStream and its subclasses)...

Doesn't seem like that definition fits. From what I can gather from various articles, a "streaming app" appears just be an app that is constantly being fed data. But doesn't that definition also apply to:

  • A RESTful HTTP service, whose web clients are constantly sending it data all day long (and also, querying it for data)
  • A standard message broker (AMQP, etc) whose clients are constantly reading/writing to its queues all day long
  • Any TCP-based network server, whose TCP clients are constantly reading/writing data to it all day long (including MMO game servers)
  • ?!?!

So I ask, because someone, somewhere really needs to bring clarity to this: "Is a streaming app just trendy, hipster buzzword banter, or is there a distinctive definition for a streaming app that sets it apart from all my examples above?"

Foi útil?

Solução

A streaming app is an app that consumes a stream of data.

A stream of data is transmitted data formatted in a way that can be useful even when incomplete. Since partial stream data does not require complete transmission this allows consumers to join and leave at any time. It also allows for transmission to be continuous, though it may start and stop on demand. It models how broadcast radio and television work.

This contrasts with file transfers that may be meaningless to consume until the transfer has been completed.

Java streams allow consumption of partial data but do nothing to transfer data over a network on their own.

And like any popular buzz word money is being spent to make it seem like it's more than it is.

Outras dicas

In a Kafka streaming application, the producer:

  • Publishes messages into the void.
  • Cannot expect an immediate reply from an intended destination (that would be RPC).
  • May not even have an intended destination (think logs).
  • May be given assurances that the messages are durably stored by Kafka and will not be lost.
  • May be given assurances that the messages are stored in order.
  • Has no idea whether zero, one, or many processes are reading its messages at any given point in time, or how far behind consumers are.
  • Has no idea how many times, or by how many parties, those messages will be consumed.

The consumer:

  • Acts on messages it pulls off a remote stream/log/queue/whatever you want to call it.
  • Is not trying to communicate back to the same function that sent the message.
  • Does not necessarily care how long ago messages were produced.
  • Does not necessarily care who else has read them, or whether they are "done" (though these semantics can be overlaid with partition and offset management, i.e. in Samza).

The HTTP service's clients cannot send it data unless the HTTP service is alive and returning 200 OK while the clients are sending. The server is not allowed to "fall behind" by more than the HTTP timeout without causing errors, or replay requests from an hour ago.

Messages typically disappear from a message broker once consumed. A Kafka consumer is allowed to "rewind" and "seek" around the past, subject to retention periods. New streaming applications may appear and consume messages from weeks ago, in the same way that they'd consume messages from seconds ago.

The TCP network server must be alive and ACKing to receive data, similar to the HTTP service.

I first heard the term streaming in the late 1990s when I was on a team who was helping to develop a "Streaming Video" delivery application. Rather than transferring the entire video file from a server to a local client that had a player installed, the video content came from the server in a stream. The client side had the capability to consume any subset of the total video, and play it. As long as the stream can carry data fast enough to have some amount of buffer reside on the client, the content could be reliably played without skips or gaps in the video/audio being presented to the user. This was my first exposure to streaming which is now the norm on pretty much all multi-media content delivery.

A streaming application is where the application is streamed to the user instead of being installed ahead of time on a computer. Application streaming is a method of delivering virtualized applications. The streaming should be transparent to the user. The client gets sufficient information from the server to trigger the application, which is generally as low as 10 percent of the application. Then the rest is streamed to the client in the background, even when the end user is performing other tasks. Application streaming makes use of the Real Time Streaming Protocol (RTSP). It is commonly used along with desktop virtualization.

There is a lot of information on the Internet For example Wikipedia- https://en.wikipedia.org/wiki/Application_streaming

Licenciado em: CC-BY-SA com atribuição
scroll top