Question

I cannot seem to get a definitive answer on the following question (Googling mostly and reading HTTP/1.1 specs):

When 'chunked' transfer encoding is used, why does the server need to write out BOTH the chunk size in bytes and have the subsequent chunk data end with CRLF. Doesn't this make sending binary data "CRLF-unclean" and the method a bit redundant? What if the data has a 0x0A followed by 0x0D in it somewhere (i.e. these are actually part of the data)? Is the client expected to adhere to the chunk size explicitly provided at the head of the chunk or choke on the first CRLF it encounters in the data? My understanding so far is to simply take the chunk size provided by the server, proceed to the next line, then read exactly this amount of bytes from within the following data(CRLF or no CRLF inside), then skip that CRLF that follows the data and repeat the procedure until no more chunks... Am I right? What is the point of the CRLF after each datachunk then? Readability?

Was it helpful?

Solution

A chunked consumer does not scan the message body for a CRLF pair. It first reads the specified number of bytes, and then reads two more bytes to confirm that they are CR and LF. If they're not, the message body is ill-formed, and either the size was specified improperly or the data was otherwise corrupted.

The trailing CRLF is a belt-and-suspenders assurance (per RFC 2616 section 3.6.1, Chunked Transfer Coding), but it also serves to maintain the consistent rule that fields start at the beginning of the line.

OTHER TIPS

The CRLF after each chunk is probably just for better readability as it’s not necessary due to the chunk size at the begin of each chunk. But the CRLF after the “chunk header” is necessary as there may be additional information after the chunk size (see Chunk Transfer Encoding):

      chunk          = chunk-size [ chunk-extension ] CRLF
                       chunk-data CRLF
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top