Thiết kế website giá rẻ

Question

I need to write SGD-Perceptron for digit recognition according to this guidelines:

The selection is MNIST database with 60000 training samples.
Get random vector and calculate net (weighted sum) and find y_j = f(net_j), where f(x) is activation function. Also w_0j are offset weights (bias weights), x0 = 1
Find ε = 0.5 * sum((d_j-y_j) ** 2), where d is desired vector (zero vector with 1 across from desired digit, I think) created from selection (y_train). I also think ε is called loss function.
If ε < ε_threshold, break the cycle.
Adjust the weights: w_ij += -η * δ_j * x_i, δ_j = -(d_j-y_j) ∙ f'(net_j), where η is learning rate, ∙ operator is inner product, as I think.
Repeat steps 2-5 until the cycle breaks.

The problem is perceptron do not learn whatever values of η, ε_threshold and offset (random initial weight offset) I set. I have similar results after every vector, for example:

desired digit from y_train: 3 (index 3 in following arrays)
weighted fum for each j: [-2.39 -2.21 -2.254 -2.49 -2.41 -2.16 -2.17 -2.37 -2.35 -2.32]
error vector (desired-y_pred): [-0.083 -0.098 -0.095 0.924 -0.082 -0.102 -0.101 -0.085 -0.086 -0.088]

Which mean the weights point at all outputs equally probable, although error vector tries to adjust them correctly.

I use sigmoid (σ(x)) activation function with derivative σ(x)*(1-σ(x)).

Here is my code:

import random
import numpy as np
from keras.datasets import mnist
from scipy.misc import derivative

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize data
X_train = X_train / 255.0
X_test = X_test / 255.0

# Constants from data
X_vector_len = 28*28
y_vector_len = 10

# Learning parameters
learning_rate = 0.05
offset = 0.03  # Random initial weight offset
e_threshold = 0.01


def activation_function(x):
    return 1 / (1 + np.exp(-x))


def acivation_derivative(x):
    return activation_function(x) * (1 - activation_function(x))


def train(X_train, y_train):
    # Weight initialization
    weights = np.random.rand(X_vector_len, y_vector_len) * offset * 2 - offset
    bias = np.random.rand(y_vector_len) * offset * 2 - offset

    epochs = 0
    while True:
        # Getting X and D vectors
        s = random.randrange(len(X_train))
        sample = np.append(X_train[s], [])
        desired = np.zeros(y_vector_len, dtype=np.float64)
        desired[y_train[s]] = 1

        net = np.dot(sample, weights) + bias  # Weighted sum
        y_pred = activation_function(net) 
        e_vec = desired - y_pred
        e = sum(e_vec ** 2) / 2  # Error ε for current vector

        if e < e_threshold:
            break

        gradient = -e_vec.dot(acivation_derivative(net))
        for j in range(y_vector_len):
            for i in range(X_vector_len):
                weights[i, j] -= learning_rate * gradient * float(sample[i])
            bias[j] -= learning_rate * gradient
        epochs += 1
    return weights, bias, epochs


def test(X_test, y_test, weights, bias):
    correct_predictions = 0
    for j, text in enumerate(y_test):
        sample = np.append(X_train[j], [])
        net = np.dot(sample, weights) + bias
        print(f"Test {j+1}: Desired {text}, prediction: {net.argmax()} ({net})")
        if text == net.argmax():
            correct_predictions += 1
    return correct_predictions / len(X_test)


if __name__ == "__main__":
    weights, bias, epochs = train(X_train, y_train)
    print(f"Model trained for {epochs} epochs")

    accuracy = test(X_test, y_test, weights, bias)
    print(f"Accuracy for test selection: {accuracy * 100:.2f}%")

Searching for mistakes in formulas and reading some implementations and math theory about SGD didn’t help me. I only got more questions because implementations are different:

Why neural network does not learn despite the error vector contains right coefficients (positive for desired, negative for other digits)
The gradient δ is a number as it is an inner product, what is δ_j? If I should use δ as δ_j, doesn’t it mix the weights? The value of error_j = d_j – y_j definitely must affect specific outputs. I tried to use both δ and δ_j = -(d_j-y_j) * f'(net_j) (regular product) with same results.
Sigmoid function has range of values = (0, 1), leading to the values of error vector either 1-σ(x) or 0-σ(x), in other words, (-1, 1). Am I right the error vector is one of the coefficients for weights and it must be positive if the result is correct and vice versa, while other coefficients always non-negative? But how then other activation functions, e.g. f(x)=x, f(x)=arctg(x) with different range of values work?
I have used unit step function as activation function in the other algorithm, is it applicable for SGD?
Can I use numpy functions to adjust weights somehow? I need to multiply all w_ij values by x_i in rows.

Thiết kế website giá rẻ

Danh mục

Stochastic Gradient Descent algorithm does not work properly (Python)