I am trying to write a Deep RL agent for MegaMan Nes using stable-baselines3 and Open AI Gym Retro. I have tried a lot of things. I used reward shaping and tried a lot of hyperparameters with PPO, but no matter what I do, my agent keep going to the right and hit the enemies that fly towards him. I modified all the hyperparameters including batch_size, n_steps, gamma, learning_rate, ent_coef, clip_range, n_epochs, gae_lambda, max_grad_norm and vf_coef.
I am beginning to think that maybe the algorithm is not suited for this kind of game or maybe the reward function is not good. I reward my agent the more he goes to the right and the more he kills enemies. I penalize him the more time passes and the more he takes damage. Maybe with DQN I can have a higher rate of success? Or maybe I can’t find the right hyperparameters for my agent to learn to climb the ladder and then continue the level from there.
This is how he looks momentarily!
Mega Man being stuck in this infinite loop of facing this block and killing the enemies that spawn from the right of the screen
I even did some HPO for 100k steps but to no avail. I think I should increase the budget to 500k and let the HPO try 100 different models but this will take a lot more time.
Alexandru Petrescu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.