Why is there a small difference between gradients calculated using torch autograd vs functorch?
I am using this linked solution from a previous question to compute gradients more efficiently than a manual loop.
I am using this linked solution from a previous question to compute gradients more efficiently than a manual loop.