Question

I am currently implementing a png reader, but I am a confused about the bit order of the data and the format in general.

I don't have access to libpng, zlib or similar.
Please note that any bitstrings I write are most-significant bit first.

According to RFC1951, "Data elements are packed [...] starting with the least significant bit"

The first byte of my sample image is: 11101101
To read the header, I reverse the bit order and get: 10110111

The first bit is saying that this is last block, which makes sense. The next 2 bits are "01", does this mean static or dynamic huffman encoding? The RFC only mentions them as bits, but not in which order they are or their numerical value.

Assuming dynamic huffman encoding, the header is followed by the 2 huffman alphabets. These are, however, also encoded. 0-15 serve as literals, 16 repeats the code before by (3 + following 2 bits) times. Am I correct in assuming that 17 and 18 repeat the literal 0?

Further questions are:

  • Did I understand all of this correctly?
  • How do I know the alphabet encoding ends and the next starts?
  • In which bit order are the alphabets and the actual compressed data in comparison to the header bits?

I think I don't understand most of chapter 3.2.7...

Was it helpful?

Solution

Don't reverse the bits except for Huffman codes, and only then if your decoding approach requires it. zlib's inflate never reverses the bits. puff reserves bits only to decode Huffman codes. You will get confused if you try to interpret all of the bits reversed, and you will have to un-reverse offset bits.

Just read the bits from the bottom.

In your example, 11101101, the bottom bit is 1 indicating that this first block is also the last block. The next two bits (not reversed) 10 says that it is a dynamic block.

You can use puff.c, in the contrib/puff directory in the zlib distribution, as a supplement to RFC 1951. puff.c was written to be a simple and clear implementation of inflate to provide an unambiguous definition of the deflate format.

OTHER TIPS

Dynamic huffman.

Mark Adler's answer (read from the bottom) is the right way to think about it when coding, although it can be confusing when crossing byte boundries--make sure to keep more than one byte loaded to avoid any confusion.

I found it wasn't a useful way when figuring stuff out bit-by-bit by hand, and reversing multiple times worked better for me, as you're doing.

If you can install software on another computer, I recommend infgen, a debugging tool Mark wrote for zlib/gzip streams and files, as being useful to check your understanding of the standard. The documentation is in a fairly large comment in the .c file, not in the README.

I also wrote up some worked examples myself. I realize I should probably copy them to stack overflow to make them self-contained, but it really runs to like 5 pages of fine detail and doesn't directly address your question.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top