Here is my code for the VAE Loss:
def loss_function(x, x_hat, mean, logvar, beta=1.0):
criterion = nn.MSELoss(reduction="mean")
reconstruction_loss = criterion(x_hat, x)
KLD = - 0.5 * torch.mean(1+ logvar - mean.pow(2) - logvar.exp())
print(f"KLD = {KLD}")
return reconstruction_loss + KLD*beta
Here is my VAE (simplified):
class VariationalAutoEncoder(nn.Module):
def __init__(self, latent_shape):
super(VariationalAutoEncoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(18,5184), # new
nn.LeakyReLU(0.2),
nn.Linear(5184,128), # new
)
# latent mean and log of the variance
self.mean_layer = nn.Linear(128, latent_shape)
self.logvar_layer = nn.Linear(128,latent_shape)
# decoder
self.decoder = nn.Sequential(
nn.Linear(latent_shape,128),
nn.LeakyReLU(0.2),
nn.Linear(128,5184),
nn.LeakyReLU(0.2),
nn.Linear(5184,18), # new
nn.Sigmoid(),
)
def encode(self, x):
x = self.encoder(x)
mean, logvar = self.mean_layer(x), self.logvar_layer(x)
return mean, logvar
def reparameterization(self, mean, logvar):
std = torch.exp(0.5 * logvar)
epsilon = torch.randn_like(std).to(device)
z = mean + std*epsilon
return z
def decode(self, z):
return self.decoder(z)
def forward(self, x):
mean, logvar = self.encode(x)
z = self.reparameterization(mean, logvar)
x_hat = self.decode(z)
return x_hat, mean, logvar
The values for KLD go on the order of 10^-7 after I use a beta value of 10^16. I’m not sure why this is the case. What should I do?
My Hyperparameters as of right now are: Adm Optimizer with weight decay as 10 and lr as 0.01, ReduceLROnPlateau scheduler with patience as 3 and factor as 0.5. Im using gradient clipping in my code with max_norm 0.5 and training for 25 epochs.
Also latent shape is 10.
I appreciate your help!
I tried beta annealing and cyclical annealing with small values and even beta KLD with 10^16 as beta value but the KLD term still vanished to 10^-8. This is like its running randomly on my task. Ive tried adding a convolution layer cause input is an image with size around 18 by 18. No change.