It was mentioned earlier. Right at the top of the document it says:
The format uses subresolution images, recursively embedded into the format itself, for storing statistical data about the images, such as the used entropy codes, spatial predictors, color space conversion, and color table.
These are arrays (or a vector in the case of the color table) of data where each element applies to a block of pixels in the actual image, e.g. a 16x16 block. These "subresolution images" are not themselves subsamples of the image being compressed.
The format description calls them images because they are stored exactly like the main image is in the format. The transforms are instructions to the decoder to apply to the decompressed main image data. The entropy image is used to decompress the main image, by virtue of providing the Huffman codes for each block.