PPO only working with a single epoch and unclipped loss
I’m attempting to implement PPO to beat cartpole-v2, I manage to get it working if I keep things as A2C (That is, without clipped loss and a single epoch), when I use clipped loss and more than one epoch it doesn’t learn, have been trying to find the issue in my implementation for about a week but I can’t find what’s wrong.