Below is a section of Matlab code from a neural net I’m trying to write. It’s my first attempt at anything related to machine learning. I’m following along with Michael Nielson’s book here: http://neuralnetworksanddeeplearning.com/chap2.html
I’m loading a set of 60000 28×28 grayscale images of handwritten digits with labels and am trying to train this neural net to identify them. There is also a test data set of 10000 images. The network has 784 input neurons (28^2), two hidden layers each with 16 neurons, and the output layer has 10 neurons.
I’m evaluating the cost function as C = 0.5*(a-y).^2. It seems mildly successful in that it started at C=1.35 and ended at C=0.46 before essentially flattening out (about 75 epochs). However, the error is still high enough it only guesses the correct digit 12% of the time, which is almost random chance. I’ve double and triple checked the math but can’t find a mistake. I’m thinking there must be one I’m not seeing. The code below is everything within the main training loop, so any bugs should be in there. I’m not breaking the images into smaller batches, but am doing the entire 60k images at a time with each epoch. Since each image is only 28×28 pixels it’s fast enough without breaking it apart. The input neurons a_0 are a 768×60000 array of doubles, with values between 0 and 1. I took the original images, where each pixel was a uint8, and converted to double then divided by 255 to get a_0. I’m numbering the layers in my code such that layer 0 is the input layer, layers 1 and 2 are the hidden layers, and layer 3 is the output layer.
a_0 = training_images;
epoch = 0;
while epoch < 5 || C(epoch - 1) - C(epoch) > 0.001
epoch = epoch + 1;
%Propagate forwards
z_1 = weights_1*a_0 + biases_1;
a_1 = sigmoid(z_1);
z_2 = weights_2*a_1 + biases_2;
a_2 = sigmoid(z_2);
z_3 = weights_3*a_2 + biases_3;
a_3 = sigmoid(z_3);
%Evaluate cost function
C(epoch) = 0.5*mean(sum((a_3-y).^2, 1));
%Propagate backwards
sigmoid_d1 = a_1 .* (1-a_1); %Sigmoid derivative
sigmoid_d2 = a_2 .* (1-a_2);
sigmoid_d3 = a_3 .* (1-a_3);
delta_3 = (a_3-y).*sigmoid_d3;
delta_2 = weights_3.'*delta_3 .* sigmoid_d2;
delta_1 = weights_2.'*delta_2 .* sigmoid_d1;
%Calculate gradient
for image_index = 1:num_images
dC_dw3(:, :, image_index) = delta_3(:, image_index) * a_2(:, image_index).';
dC_dw2(:, :, image_index) = delta_2(:, image_index) * a_1(:, image_index).';
dC_dw1(:, :, image_index) = delta_1(:, image_index) * a_0(:, image_index).';
end
%Calculate adjustment
training_rate = 0.1;
adjust_biases_1 = -training_rate * mean(delta_1, 2);
adjust_biases_2 = -training_rate * mean(delta_2, 2);
adjust_biases_3 = -training_rate * mean(delta_3, 2);
adjust_weights_1 = -training_rate * mean(dC_dw1, 3);
adjust_weights_2 = -training_rate * mean(dC_dw2, 3);
adjust_weights_3 = -training_rate * mean(dC_dw3, 3);
biases_1 = biases_1 + adjust_biases_1;
biases_2 = biases_2 + adjust_biases_2;
biases_3 = biases_3 + adjust_biases_3;
weights_1 = weights_1 + adjust_weights_1;
weights_2 = weights_2 + adjust_weights_2;
weights_3 = weights_3 + adjust_weights_3;