A3C agent (continuous action space) not being trained properly and mu and std converges very quickly to weird value
I’m currently trying to implement A3C on InvertedPendulumSwingupBulletEnv-v0 environment. The code runs alright but the agent doesn’t perform well. Only after steps mean converges to 1 and standard deviation converges to 0.001. So I was wondering if there’s any error with my implementation.
A3C agent (continuous action space) not being trained properly and producing either very high or low std
I’m currently trying to implement A3C on InvertedPendulumSwingupBulletEnv-v0 environment. The code runs alright but the agent doesn’t perform well. I’ve check some key variables and found that std always either converges to minimum value(1e-6) or keeps on increasing forever. So I was wondering if there’s any error with my implementation.