I’m trying to implement the gradient of the softmax entropy loss in Python. However, I can see that the analytical gradient does not match the numeric gradient. Here is my Python code:
`import numpy as np
def softmax(z):
expp = np.exp(z)
return np.divide(expp, np.sum(expp))
def cost(z, y):
s = softmax(z)
return -np.sum(y * np.log(s)) / len(y)
def costprime(z, y):
prime = []
for i in range(len(z)):
values = z.copy()
values[i] += 1.0e-10
prime.append((cost(values, y) - cost(z, y)) / 1.0e-10)
return prime
z = np.array([1.1, 2.2, 0.3, -1.7])
y_expected = np.array([0, 0, 1, 0])
s = softmax(z)
cost_gradient = s – y_expected
numerical_derivative = costprime(z, y_expected)
print(cost_gradient)
print(numerical_derivative)`
The result is:
[ 0.22151804 0.66547696 -0.90046553 0.01347053] [0.05538014491435206, 0.16636914068612896, -0.2251154818111445, 0.0033673064336881]
They look very different. However, when I try to change the values of z to see their effect on the cost, I found out that numerical derivative is more accurate than the analytical derivative (cost_gradient)
Jawad Damir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.