I have the architecture below, it is pretty simple, and gives me an average error of around 0.052 when calculated by MAE. It is not that bad, but some high frequency information gets lost, and I have to eliminate that issue for a lot of reasons.
The training and testing data consists of 8 long vectors, and I want to encode them into single scalars.
The data is very diverse. Also I don’t want to apply fourier transform, I just want to use the raw audio data.
Do you have any ideas how to encode the data more efficiently? I mean any tricks or something to reduce the loss? It is maybe just impossible mathematically to do this, but I just cant stop thinking about the possible solutions.
I’m happy to trade generation time to accuracy, my point is to reduce the dimensionality as much as possible with negligible loss.
class Autoencoder(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(8, 1),
)
self.decoder = nn.Sequential(
nn.Linear(1, 8),
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
def encode(self, x):
return self.encoder(x)
def decode(self, x):
return self.decoder(x)
My training algorithm is fine I think, I’m using MSE, lr=1e-3, batch_size=64, adam optimizer, nothing too interesting.
I tried adding more layers, with ReLU activations.
I tried layer normalization, dropouts.
I tried using a transformer architecture, it gave me intereting results but more loss and noise.
I tried different learning rates, batch sizes.
I expected the loss to get lower with more layers and regulations but it even increased.
Nex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.