I am writing a convolutional autoencoder for a 3D input with 4 channels.
The encoder part is composed of a number of blocks, where each block contains a 3D convolution layer, a ReLU activation layer, and an average-pooling layer. The last layer of the encoder is simply a 3D convolution without any activation or pooling.
The decoder is composed of transpose convolution layers with strides such that each of these layers increases the shape by the same amount that the encoder blocks decrease it.
For some data points, the 4 channels of the input may have very different scale. In these cases it is OK if the lower-scale channels are not reconstructed correctly by the AU, but the larger scale channels, and the scale difference between the channels should be reconstructed in a roughly correct manner.
With the exception of a zeros tensor, every input that I try to insert to the model results in the same output (up to some numerical accuracy).
This is true both for a “trained” model or for a newly initiated one.
I train the model using MSE loss and an SGD optimizer with momentum of 0.9. This is, however, irrelevant since the problem occurs even for an untrained model.
I have seen that this is a regular issue, but I have not yet found any answer to the probelm.
In the case I try to overfit the model to a single such data point, it learns to reproduce it perfectly, but any other input that is fed to this overfit model results in the same output.
In the case I try to overfit the model to a few data points, the results of any input is some averaged version of all of the training examples.
Adding or removing layers from the model doesn’t change this behaviour, as well as changing the dimensionallity of the bottleneck.
The average-pooling layer is physics-ispired, but regardless choosing max-pooling doesn’t change this behaviour.
If I try adding batch normalization layers, the model’s output is dependent on the input, but it fails miserably with any of the channel-imbalanced examples.
Since my goal is to utilize the learned embedding and not the AU output, normalizing each channel, using batchnorm, and keeping the normalization details is not an acceptable approach.
Amir K is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.