How to manage vanishing gradients for recursive matrix operations
I am building a neural network in PyTorch with an inner loop which applies a series of matrix operations recursively to a BxNxN
matrix.
I am building a neural network in PyTorch with an inner loop which applies a series of matrix operations recursively to a BxNxN
matrix.