I’m coding my very first neural network model in python. The data I use for training has 14 features, 4 different classes and a total of 1953 instances. I use mini-batches of 4 instances for training. However, the error only keeps increasing with each training epoch, even when I adjust the learning rate. I will add the methods that I believe might cause the issue below. This is the graph I get for 60 training epochs with a learning rate of 0.001 (changing the learning rate doesn’t change the overall shape of the graph) :
graph of training error over n training epochs
My forward propagation function :
def forward_propagation(self, X, y):
# Initialize activations for the input layer
self.activations[0] = X.T
# Forward propagation through each hidden layer
for i in range(self.n_layers):
# Calculate weighted inputs and apply activation function
self.weighted_inputs[i+1] = np.dot(self.weights[i].T, self.activations[i]) + self.biases[i].T
#print("n°", i, " weighted inputs = ", self.weighted_inputs[i+1].shape, " weights = ", self.weights[i].T.shape)
self.activations[i+1], self.df[i] = self.activation_function(self.weighted_inputs[i+1])
# Forward propagation to the output layer with softmax activation
self.weighted_inputs[-1] = np.dot(self.weights[-2].T, self.activations[-2]) + self.biases[-2].T
self.activations[-1], self.df[-1] = self.softmax(self.weighted_inputs[-1])
# Compute the error using cross-entropy cost function
y_hat = self.activations[-1]
error = self.cross_entropy_cost(y_hat, y)
# test :
#print("y_hat, error : ", y_hat, error)
#print("y_hat shape = ", y_hat.shape)
return y_hat, error
Backward propagation :
def backward_pass(self, X, y):
# initialisation des listes pour les erreurs et les ajustements
delta = [None] * (self.n_layers + 1)
dW = [None] * (self.n_layers + 1)
db = [None] * (self.n_layers + 1)
# propagation avant
y_hat, error = self.forward_propagation(X, y)
# calcul de l'erreur de la couche de sortie
delta[-1] = y_hat - error
#print("erreur sortie = ", delta[-1])
# calcul des ajustements des poids et des biais pour la couche de sortie
dW[-1] = np.dot(delta[-1], self.activations[-2].T)
db[-1] = np.sum(delta[-1], axis=1, keepdims=True)
# rétro-propagation de l'erreur pour les couches cachées
for l in range(self.n_layers-1, -1, -1):
# erreur de la couche actuelle
delta[l] = np.multiply(np.dot(self.weights[l+1], delta[l+1]), self.df[l])
# ajustements des poids & biais de la couche actuelle
if l == 0:
dW[l] = np.dot(delta[l], X) # première couche
else:
dW[l] = np.dot(delta[l], self.activations[l].T)
db[l] = np.sum(delta[l], axis=1, keepdims=True)
self.weights[l] -= (self.learning_rate * dW[l]).T
self.biases[l] -= (self.learning_rate * db[l]).T
return error
Function that goes through one epoch of training :
def epoch(self):
total_error = 0
self.batches = self.batch_generator(4)
self.batches_X_train = [batch[0] for batch in self.batches]
self.batches_y_train = [batch[1] for batch in self.batches]
for batch_X, batch_y in zip(self.batches_X_train, self.batches_y_train):
# passe avant et rétro-propagation pour chaque mini-batch d'entraînement
error = self.backward_pass(batch_X, batch_y)
total_error += error
# erreur moyenne sur tous les mini-batchs d'entraînement pour cette epoch
average_error = total_error / self.n_batches
#print("average_error = ", average_error)
return average_error
fit function which calls epoch n times and shows the graph of training error :
def fit(self, n_epochs):
self.n_epochs = n_epochs
average_error_train = []
for epoch in range(1, n_epochs + 1):
print("epoch n°", epoch)
# Calculate and store the training error for this epoch
error_train = self.epoch()
average_error_train.append(error_train)
print(f"Epoch {epoch}/{n_epochs} - Train Error: {error_train}")
# Plot the training error
plt.plot(range(1, n_epochs + 1), average_error_train, label='Train Error')
plt.xlabel('Epochs')
plt.ylabel('Error')
plt.title('Training Error')
plt.legend()
plt.show()
print("total average error train : ", average_error_train)
return average_error_train
I’ve tested each method separately and the results were coherent. When calling fit(n_epochs) and thus using all these methods together, the error only increases with each epoch. The softmax, cross_entropy_cost and activation functions (tanh or ReLu depending on input) return the correct values and the data has been normalised before training. At this point there is for sure something I must’ve overlooked but I just can’t figure it out.
rosepasverte is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.