TIFF plot generation and compression: R vs. GIMP vs. IrfanView vs. Photoshop file sizes

Question

Apparently the TIFF LZW compressor used by R is not making use of an important option (the TIFF predictor) which is leading to an extremely large file. Data compression works best when it can recognize symmetries/redundancies in the data. In this case, the image data is composed of 24-bit (3-byte) pixels containing red, green and blue 8-bit values. Standard LZW compression looks at a stream of bytes for repeating patterns. If it looks at the color image simply as a stream of bytes, it will see repeating patterns of 3-bytes instead of repeating patterns of constant color. Enabling the TIFF predictor on the data causes a differencing filter to store the delta of each pixel with its neighbor. If the neighboring pixels are the same color, it will store 0's. A long string of 0's compresses much better than repeating patterns of non-zeros which are at least 3 bytes long.

Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:

Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)

After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.

Conclusion: This should be filed as a bug against the owner of the R compression code since using LZW on full color images without the predictor produces poor results. In the mean time, a workaround is needed to compress it more efficiently.