Question

I want to serve some streaming data over HTTP. Additionally, I'd like to compress the data to save bandwidth. I can trivially compress each streaming response separately, but in my case, the data stream is largely the same for every single client (fanout), so it seems like a waste of CPU time to compress the same data for each connection. My plan is to compress each chunk of streaming data preemptively, so clients who connect at any time can start reading the next chunk (this comes at the cost of reduced compression efficiency, but as long as the individual chunks of data are big enough, this should be fine).

Compliant HTTP clients are apparently supposed to be able to accept multiple gzipped files in a Content-Encoding: gzip response, but the answers to this question indicate web browsers do not. However, from my understanding of DEFLATE/zlib, you can instead send Z_FULL_FLUSH 0x0000FFFF bytes to reset the stream which should serve the same effect of individually decompressible chunks.

I've set up a simple POC in node.js that streams a message as a Server Sent Events stream, but I can't get a web-browser to read the data; it will open the connection, but will never flush the data. I'm using Z_NO_COMPRESSION for simplicity.

var http, zlib, gzip, numClients;
http = require('http');
zlib = require('zlib');
gzip = zlib.createDeflateRaw({
  flush: zlib.Z_SYNC_FLUSH,
  level: zlib.Z_NO_COMPRESSION
});
numClients = 0;
setInterval(function(){
  if (numClients > 0) {
    gzip.write("data: hi\n\n");
  }
}, 1000);
http.createServer(function(req, res){
  res.socket.setTimeout(Infinity);
  res.writeHead(200, {
    'Content-Type': 'text/event-stream',
    'Content-Encoding': 'deflate',
    'Transfer-Encoding': 'identity',
    'Access-Control-Allow-Origin': '*'
  });

  res.write('\x78\x01'); // write zlib magic number
  gzip.pipe(res);
  numClients++;
  res.on('close', function(){
    numClients--;
    gzip.unpipe(res);
  });
  res.on('error', function(){
    numClients--;
    gzip.unpipe(res);
  });
}).listen(8080);
numClients++;
gzip.pipe(process.stdout);

And a simple client:

<!DOCTYPE html5>
<html lang=en>
<meta charset=utf-8>
<title>hi</title>

<script>
var es = new EventSource("http://localhost:8080/");
es.addEventListener('data', console.log);
es.addEventListener('open', console.log);
es.addEventListener('error', console.log);
</script>

Bytes look like this, (curl -N localhost:8080 piped through xxd):

0000000: 7801 000a 00f5 ff64 6174 613a 2068 690a  x......data: hi.
0000010: 0a00 0000 ffff 000a 00f5 ff64 6174 613a  ...........data:
0000020: 2068 690a 0a00 0000 ffff 000a 00f5 ff64   hi............d
0000030: 6174 613a 2068 690a 0a00 0000 ffff 000a  ata: hi.........
0000040: 00f5 ff64 6174 613a 2068 690a 0a00 0000  ...data: hi.....
0000050: ffff 000a 00f5 ff64 6174 613a 2068 690a  .......data: hi.
0000060: 0a00 0000 ffff 000a 00f5 ff64 6174 613a  ...........data:

Do I need to add extra framing for a DEFLATE decompressor to detect the flush points?

EDIT: I added the zlib magic number to make the http stream a valid DEFLATE stream, but the web browser still won't flush the blocks. However, piping the gzip stream back through zlib.createInflate() works fine if the magic number is added. I also know that the http stream isn't being buffered because curl -N localhost:8080 will display the raw bytes.

Was it helpful?

Solution 2

The problem wasn't actually with DEFLATE, zlib, or any actual framing, but with the EventSource client. Embarrassing.

The Server-Sent Events specification defines three events on the EventSource object, open, error, and message. message fires for all messages emitted on the stream. Any other events subscribed to with addEventListener serve as sugar for filtering by server-specified event types, using the event syntax, e.g.

event: <event-name>
data: ...

In other words, changing this line in client.html:

es.addEventListener('data', console.log);

to

es.addEventListener('message', console.log);

Will cause the web browser to correctly log all the "hi" messages from the server as they are flushed.

Re: compression, the code in the question does generate a valid DEFLATE stream, so it's all good on that part.

I think it could also be adapted to a gzip stream with some header changes, but the only real change of gzip over DEFLATE is the integrity check at the end of the stream, as mentioned In Mark Adler's answer. In my case, I don't have a stream end, so I can never send a checksum anyway, negating any benefits of gzip.

OTHER TIPS

The zlib stream needs to be terminated by a last block and an integrity check. The last block is signified by the "last bit" set at the start of the block. In this case since the next-to-last block is a stored block, putting you at a byte boundary, the last block can be 01 00 00 ff ff.

Then you need an Adler-32 check on the uncompressed data as the last four bytes of the stream, following the last block.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top