CS231n : Why gradient df is multiplied elementwise for calculating numerical gradient?
I am trying to learn CNN by following stanford’s cs231n lectures and I have a question in assignment 1 of two layer network.
I am trying to learn CNN by following stanford’s cs231n lectures and I have a question in assignment 1 of two layer network.