Summarizing pngs, size optimization

https://stackoverflow.com/questions/9252151

28-04-2021
|

Question

I have about 1000 PNG images (tiles) of equal size (32px x 32px). They all look different, but some are very similar (they use identical colors). I have to summarize them in PNG images (blocks) with about 50 tiles each. The block size is dynamic, but should not be too small.

My goal is to minimize the size of the resulting blocks.

Wikipedia tells me that the PNG file size depends on the color depth per pixel.

My idea is to group the tiles so that each group has the least amount of colors. Also the color index needs to be stored, so saving each tile as a block would not be the optimal solution. Doing a brute-force run would require a very long time, so I was hoping for a good sketch of a grouping algorithm.

I'd assume that the file size makes a "jump" when the amount of colors goes over 1,2,4,8,16,32 and so on. So those could be thresholds to look out for.

I will now sketch an algorithm that does not produce the optimal solution

Define

Introduce a group-tile distance. The distance a group of tiles A has to a tile B, is the amount of different colors that are in B, but not in the group A.

Color(G) is the total amount of different colors in a group of tiles G.

Algorithm

1) Pick the first tile and put it into Group1.

2) Loop over all remaining tiles and put the tile T with the smallest group-tile distance d_T into Group1, if Colors(Group1) + d_T is smaller-equal than some threshold, e.g. 16. Repeat this step until no such tile is found.

3) Pick the next remaining tile and repeat the procedure.

Adjust threshold if too many or too few groups.

Unfortunately this will not necessarily result in the best possible result (there might be a larger group possible with the same threshold).

Could this algorithm be changed to return the optimal solution or should I choose a different approach?

Are there maybe any factors that affect the PNG size, which I did not consider?

Solution

This can be split into two problems:

choosing which images to put together (to get minimal palette)
choosing order they're in (to take advantage of PNG filters)

Palette

Work with histograms. You can posterize colors (truncate least significant bits) to get smaller histograms and bigger overlaps between histograms of different images.

Then group them, e.g. do pairwise grouping of least dissimilar histograms or use K-means clustering.

The dissimilarity would be measured by number of new histogram entries that would be added to group's total histogram (e.g. if group already uses blue and pink colors, then blue image costs 0, blue+red costs 1, yellow+green costs 2).

You can also weigh dissimilarity by number of pixels using given color (sqrt(num_pixels) works well for pngquant) and perhaps distance from nearest available color in the group's histogram.

Not just raw number of colors counts, but also how well those colors are chosen. Palette with lower mean square error will almost always give better compression (amount of visual information per bit), because colors won't be "spent" poorly on less important areas of the image.

Filters

For first approximation you could sort images by their smoothness. Filters are useful for gradients and usually don't help in noisy areas.

Then you can brute-force a bit more by randomly swapping images when it improves filtering heuristic, i.e:

Pick two images
Calculate heuristic for 32 rows of pixels in rows of those images
Keep swap if that improves the score, backtrack if it doesn't
Leave that running overnight

Focus on #1, as filters don't give as big savings as optimal palette.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow