Does anyone know why my PPO code is not working?
I used a youtube tutorial to implement this. I refered to his code directly in his github and I looked through other people code that are similar as well. I am not sure why it does not work in the sense that the agent does not learn :(. Bellow is my agent class.
What’s a good way to represent equivalence classes of matrices in python?
I’m trying to write a program to do reinforcement learning for tic-tac-toe. I want the engine to recognize that if you reflect a board or rotate it you get the exact same game, so that these boards should be considered the same as each other.
Is there an implementation of the ATRPO algorithm that allows for Discrete Actions?
I’m working on a project with my teacher, and we intended to test the ATRPO (Average Reward TRPO) against different state of the art algorithms. However, I only managed to find one library that uses it (https://github.com/sony/nnabla-rl), and sadly it’s implemented in a way where it does not support Discrete Actions, which means we couldn’t use the intended environment.
what can i do to solve this python error?
how can I get out of this problem?