I'm trying to understand what's possible with TensorFlow's output layer. Specifically, are outputs always a flat array?

Since a neuron (or 'unit', in TF) has just one number, and there is only one set of outputs, it seems that output must have a single dimension. With one-hot probabilities, this is easy to understand. But what about an image?

If my output is going to be a picture, can I have TF output a multi-dimensional array of pixels, e.g. [[r0, g0, b0], [r1, g1, b1], ...]? If so, how would that network be constructed? How would I define the output layer's dimensionality/shape?

The only param I know of that defines output shape is this, from tf.layers.dense, which seems inherently one-dimensional:

  • units (number) Positive integer, dimensionality of the output space.

Any help you can provide is greatly appreciated!

Reference: https://js.tensorflow.org/api/latest/#layers.dense
有帮助吗?

解决方案

The output layer does not have to be 1D (excl. batch size) but even if it is, it does not necessary mean you cannot transform it to a n dimensional space. Consider an autoencoder used to reconstruct an image:

  1. In the simplest case we could flatten a image (e.g. 24 x 24 pixels) and learn a network to predict the 24 x 24 pixels (output a 1D image). These pixels can then be transformed back to an 2D image (https://www.tensorflow.org/tutorials/generative/autoencoder). So in other words, even if your network outputs a 1D shape, nothing prevent you from reconstructing it to a higher space.

  2. We can achieve similar results as stated in the point above, by using an encoding network (convolutional + pooling layers) followed by a decoding layer (transposed convolutional + up sampling layers). In this case you can effectively generate a 2D image directly (https://www.tensorflow.org/tutorials/generative/cvae).

You can also look at image segmentation networks for inspiration of how higher dimensions outputs can be generated.

许可以下: CC-BY-SA归因
scroll top