I customize a gymnasium enviroment and train it with stable_baseline3. But leaning process change my enviroment.
>>>print(env.step(2))
(510, -0.1, False, False, {})
>>>model.learn(total_timesteps=10000)
>>>print(game.step(2))
(104, -0.1, False, False, {'TimeLimit.truncated': False})
For the same action = 2, enviroment after learning give me another observation.
Do anyone know why? Thanks!
New contributor
user21588592 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.