I was attempting to recreate this excerpt from the Neuromatch DL course, and here I’m training a multi-layer perceptron (in this case, with no hidden layers) on the MNIST image dataset. I noticed that every time I ran the entire code, the init loss as well as the End loss were decreasing, from (2.488, 0.0965) to (2.38,0.092) after three runs.
How is this happening? Since the MLP is being initialised again and again, shouldn’t the weights revert to a default? Why is the model fitting better?
Thanks again, I’ve attached the code below:
#Loaded the dataset:
train_set, test_set = load_mnist_data(change_tensors=True)
# Sample a random subset of 500 indices
subset_index = np.random.choice(len(train_set.data), 500)
# We will use these symbols to represent the training data and labels, to stay
# as close to the mathematical expressions as possible.
X, y = train_set.data[subset_index, :], train_set.targets[subset_index]
loss_fn = F.nll_loss
cell_verbose = True #Just to toggle whether or not to print the loss
#This is where the actual model training and optimisation begins, i believe
partial_trained_model = MLP(in_dim=784, out_dim=10, hidden_dims=[])
if cell_verbose:
print('Init loss', loss_fn(partial_trained_model(X), y).item()) # This matches around np.log(10 = # of classes)
# Invoke an optimizer using Adaptive gradient and Momentum (more about this in Section 7)
optimizer = optim.Adam(partial_trained_model.parameters(), lr=7e-4)
for i in range(200):
loss = loss_fn(partial_trained_model(X), y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if cell_verbose:
print('End loss', loss_fn(partial_trained_model(X), y).item()) # This should be less than 1e-2```
I added a hidden layer, and the same trait persisted.
Thanks for the help!
NAVEEN BALACHANDRAN is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.