Why do I get different testing result using the same Q-value table
I am studying ML and was trying to make a reinforcement learning algorithm for a gymnasium environment. I already made a q-learning for a very basic and simple problem and I decided to use the same algorithm with a slightly more complex environment such as a the car-pole.