Question

Specifically: I would like compress a set of coordinates, which map to the locations of 1's in a binary image, and then decode back to the original set. For instance, for a 16x16 image, the input might be something like the following:

[5, 4], [12, 5], [8, 7],....

I am not looking to recognize any spatial patterns, nor is this a time series problem because the input corresponds to just one static image. The trained autoencoder should be able to handle any array of arbitrary "coordinates", under the assumption that the data is scaled between 0 to 1 so the resolution (actual range of numbers) is inconsequential. Is this doable? What would be a good way to train it?

Was it helpful?

Solution

This is doable, but the caveat is that a neural network might not give you a better result than a lossless compression algorithm (depending on what you want to use the compressed version for).

The most sensible way to do it in my opinion is to have a function at the start that converts the coordinates into an image and a function at the end to convert the recovered image to coordinates again.

The way you would train it is to gather a large number of training examples, pass them as 16*16 tensors containing 1s and 0s to your autoencoder network, then use the reconstructed image to extract the coordinates where the value is 1.

There’s a good article/tutorial here to get you started.

If you’re determined not to explicitly give your network a 16*16 tensor you could pass your coordinates as an ordered set of two dimensional vectors and treat the problem as a sequence transduction task. The network would need to learn an embedding for each set of coordinates and then predict each embedding based on surrounding embeddings. I expect you won’t get much joy from a transformer model as they are notoriously hard to train, but a recurrent model with attention like the ones in the article I’ve linked to might give interesting results.

OTHER TIPS

If your images have only a few 1's, you would achieve a lossless compression without the need for any deep learning techniques by using sparse matrix storage formats like Compressed Sparse Row or others.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top