I’m working on a project where I have a process in a certain number of steps ; let’s say 3.
- We start with an initialized null matrix M0 which describes the state of our system, and have to take an action which consequences are random but strongly influenced by the action
- We update the matrix (M1), and again take an action
- We again update the matrix (M2), take a last action
- Only now we can calculate the loss from the last matrix M3, so we have the evaluation of our strategy
I have built a Neural Network where the matrix of the state is the input, and the output is a list of weights that decides which action to take. From my (beginner) understanding the whole looks a bit like a recurrent Neural Network but with extra steps.
I implemented the process in Python but when I try to train the model the GradientTape() doesn’t look happy as the gradient is never calculated for my variables ; I’m not sure but I think the randomness of the loss makes the gradient not computable (computing the loss from the weights is far from being straightforward, as the weights are put on edges of a graph and we perform algorithms on it).
I have thought of doing Reinforcement Learning without the list of weights to take the action, however I don’t know how well the randomness of the reward will work. Also, I have never done such a thing and I have only a week to implement everything. The space of actions is also pretty big so there is a lot of uncertainty with this method.
Is there any common practice on how to tackle this kind of problem ?
Thanks !
Best
Ph W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.